SectionOrganizing and Cleaning Data: From Messy to Meaningful
Real-world data collection is messy. Students might record information inconsistently, miss data points, or make recording errors. Teaching students to organize and clean data helps them understand that data analysis requires careful preparation. This series of videos below are from Delavari, Shelton, Ireland, and Weiland (2025) through Statistical Literacy and Critical Education (SLiCE) and covers the key components of data moves.
Students survey classmates about favorite colors and get these responses: “blue”, “Blue”, “light blue”, “navy”, “red”, “RED”, “green”. What problems do you notice?
This data needs cleaning because: (1) capitalization is inconsistent (“blue” vs “Blue” vs “RED”), and (2) it’s unclear whether “light blue” and “navy” should count as “blue” or be separate categories. Students need to make decisions about how to group responses consistently before they can analyze the data meaningfully.
Students are collecting data about pets and get responses like: “dog”, “puppy”, “golden retriever”, “cat”, “kitten”, “fish”. How should they group these responses?
The best approach depends on how much data is missing and why. If only a few students didn’t respond, you might exclude those responses or follow up to get complete information. If many students skipped the question, you need to consider whether the question was unclear or too personal. Don’t guess or make up missing data—that creates false information.