Skip to main content

Section Working with Existing Datasets: Finding and Using Secondary Data

Not all student investigations require collecting new data. Sometimes the most interesting questions can be explored using datasets that already exist—from government websites, research organizations, or even school records (when appropriate and anonymized).

Exploration 14. Finding Student-Appropriate Datasets.

Elementary-Friendly Sources:
• Local weather data from weather.gov
• School lunch participation numbers (anonymized)
• Community events attendance
• Simple sports statistics
Secondary-Appropriate Sources:
• Census data about community demographics
• Environmental data (air quality, recycling rates)
• Education statistics (graduation rates, college enrollment)
• Economic indicators relevant to local community
Key Criteria for Student Datasets:
• Small enough to work with manually (under 100 data points for elementary, under 500 for secondary)
• Relevant to students’ lives or interests
• From trustworthy sources
• Understandable variable names and units

Checkpoint 38.

When students use data collected by others, what’s the most important thing for them to consider first?
Hint.
Think about the questions from Module 1 about understanding the story behind data.
Solution.
Before analyzing any dataset, students should understand its context: Who collected it? When was it collected? How was it collected? Why was it collected? This helps them understand potential limitations, biases, or appropriate uses of the data. For example, data about school lunch preferences collected in 2015 might not reflect current preferences.

Checkpoint 39.

You find a dataset that would be perfect for your students’ investigation, but it has 50 variables and 1,000 rows. How could you make this manageable for classroom use?
Hint.
Consider what parts of the dataset are most relevant to your students’ questions.
Solution.
You could: (1) Choose 3-5 variables that directly relate to your students’ questions, (2) Select a random sample of 50-100 rows, or (3) Focus on data from a specific time period or geographic area that’s relevant to your students. The goal is maintaining the dataset’s integrity while making it manageable for student analysis.

Checkpoint 40.

What ethical considerations apply when students use datasets collected by others?
Hint.
Think about privacy, consent, and appropriate use of information about real people.
Answer.
Respect privacy, give proper credit to data sources, and use data for appropriate educational purposes.
Solution.
Students should: (1) Use only datasets that are publicly available or appropriately anonymized, (2) Give credit to original data collectors, (3) Respect the intended use of the data, (4) Avoid making harmful generalizations about groups of people, and (5) Understand that even anonymized data represents real people who deserve respect.