Skip to main content

Section Unit 3 Summary

In this unit, we’ve explored essential data moves that transform raw data into meaningful insights:
  • Data Cleaning and Organization: We learned how to handle missing values, deal with outliers, and restructure data to prepare it for analysis.
  • Filtering and Subsetting: We explored techniques for creating focused subsets of data to answer specific questions and make meaningful comparisons.
  • Summarizing and Calculating: We examined summary statistics that capture key aspects of our data and learned to create derived variables that reveal new insights.
  • Grouping and Comparing: We developed skills for aggregating data by categories and making meaningful comparisons between groups.
We also explored important ethical considerations regarding selection bias and developed our statistical thinking about variation within and between groups.
By the end of this unit, you should have applied these data moves to both our Community Health dataset and your own chosen dataset. These skills provide the foundation for the more advanced visualization and communication techniques we’ll explore in the next unit.

Checkpoint 83. Unit 3 Reflection.

Take some time to reflect on what you’ve learned in this unit:
  • Which data move did you find most challenging to implement, and why?
  • What surprised you about your dataset when you applied these data moves?
  • How have these data moves helped you address your investigation questions?
  • What new questions have emerged as you’ve worked with your data?

Checkpoint 84. Unit 3 Review.

    Which sequence of data moves best represents a typical data analysis workflow?
  • Grouping → Cleaning → Filtering → Creating visualizations
  • This sequence is problematic because grouping data before cleaning it could lead to incorrect aggregations based on errors or missing values.
  • Cleaning → Creating derived variables → Filtering → Grouping and comparing
  • Correct! This sequence represents a logical workflow: first clean the data to address quality issues, then create any needed derived variables, then filter to focus on relevant subsets, and finally group and compare to identify patterns.
  • Filtering → Cleaning → Summarizing → Creating derived variables
  • Filtering before cleaning could result in removing data that might be valuable once cleaned, potentially introducing bias.
  • Creating derived variables → Summarizing → Cleaning → Grouping
  • Creating derived variables before cleaning could propagate errors into new variables, and summarizing before cleaning could lead to misleading statistics.