There are predictable mistakes that people make when interpreting data. Teaching students to recognize and avoid these pitfalls helps them become more sophisticated consumers and creators of data-based arguments.
Students find that overall, boys in their school have higher math test scores than girls. However, when they look at each grade level separately, girls have higher scores in every single grade. How is this possible?
This is an example of Simpson’s Paradox—when overall trends reverse when you look at subgroups. It might occur if older students (who generally score higher) include more boys, while younger students include more girls. This teaches students that aggregate data can be misleading and that breaking data into subgroups can reveal important patterns.
A student wants to argue that homework should be eliminated. They find one study showing no relationship between homework and grades, but ignore five other studies showing positive relationships. What’s the problem with this approach?
This is cherry-picking—selecting only data that supports your preferred conclusion. Good data interpretation requires considering all available evidence and explaining why some studies might show different results. Students should learn to look for patterns across multiple sources of evidence rather than searching for single sources that confirm their beliefs.
Students conduct a survey in their suburban school and find that 80% of students have access to high-speed internet at home. A student concludes: “Most teenagers have good internet access.” What’s problematic about this generalization?
The sample comes from one geographic area and socioeconomic context. Teenagers in rural areas, different countries, or lower-income communities might have very different internet access rates. Students should learn to be specific about what populations their data represents and avoid broad generalizations that go beyond their evidence.