As students become more sophisticated in their data work, they need to understand that not all datasets are created equal. Some are clean and well-organized, others are messy and complex. Learning to work with increasing levels of complexity prepares students for real-world data analysis.
Students are using attendance data from their school. They notice that Fridays consistently show lower attendance than other days, but there’s no data recorded for several random dates throughout the year. What should they consider about data quality?
Students should: (1) Consider whether lower Friday attendance reflects a real pattern or data collection issues, (2) Investigate why certain dates are missing (holidays? technical problems? weather closures?), (3) Decide whether to exclude incomplete weeks from analysis, and (4) Consider how missing data might bias their conclusions about attendance patterns.
Think about your students’ current skill level. What level of data complexity would be appropriate for their next investigation? What support would they need to work with slightly more complex data than they’ve used before?