SectionWorking with Existing Datasets: Finding and Using Secondary Data
Not all student investigations require collecting new data. Sometimes the most interesting questions can be explored using datasets that already exist—from government websites, research organizations, or even school records (when appropriate and anonymized).
Before analyzing any dataset, students should understand its context: Who collected it? When was it collected? How was it collected? Why was it collected? This helps them understand potential limitations, biases, or appropriate uses of the data. For example, data about school lunch preferences collected in 2015 might not reflect current preferences.
You find a dataset that would be perfect for your students’ investigation, but it has 50 variables and 1,000 rows. How could you make this manageable for classroom use?
You could: (1) Choose 3-5 variables that directly relate to your students’ questions, (2) Select a random sample of 50-100 rows, or (3) Focus on data from a specific time period or geographic area that’s relevant to your students. The goal is maintaining the dataset’s integrity while making it manageable for student analysis.
Students should: (1) Use only datasets that are publicly available or appropriately anonymized, (2) Give credit to original data collectors, (3) Respect the intended use of the data, (4) Avoid making harmful generalizations about groups of people, and (5) Understand that even anonymized data represents real people who deserve respect.