Skip to main content

Section Project Overview

The course project is a semester-long data science investigation where you will:
  • Select a dataset aligned with your interests
  • Formulate meaningful statistical questions
  • Clean, organize, and transform the data
  • Create insightful visualizations
  • Build an interactive dashboard in CODAP
  • Present your findings in a clear, compelling manner
This project accounts for a large portion of your total grade in this class, but don’t be overwhelmed! The project is designed to be completed incrementally throughout the semester, with specific milestones aligned to each unit of course content.

Subsection Learning Objectives

Through this project, you will demonstrate your ability to:
  • Apply data science concepts and techniques to real-world questions
  • Work effectively with data of varying quality and complexity
  • Create clear, accurate, and insightful data visualizations
  • Develop a coherent data narrative that communicates findings effectively
  • Critically evaluate data sources, limitations, and ethical considerations
  • Use CODAP to perform various data moves and create interactive visualizations

Subsection Dataset Selection Guidelines

Selecting an appropriate dataset is crucial for project success. Your dataset should:
Size and Complexity
Contain at least 100 records (rows) with at least 8 variables (columns). The dataset should be complex enough to support meaningful analysis but not so large that it becomes unmanageable in CODAP.
Variable Types
Include a mix of categorical and numerical variables to allow for diverse analysis and visualization techniques.
Quality and Completeness
Have reasonable completeness, but some missing values or quality issues are acceptable (and even educational to work with).
Interest and Relevance
Relate to a topic you find personally interesting or relevant, which will help sustain your engagement throughout the semester.
Accessibility
Be publicly available or properly licensed for educational use. You should be able to share the dataset with the instructor and classmates.
Recommended data repositories include:

Checkpoint 112. Dataset Evaluation Checklist.

Use this checklist to evaluate potential datasets for your project. A suitable dataset should meet most of these criteria:
  • Contains at least 100 records (rows)
  • Includes at least 8 variables (columns)
  • Has a mix of categorical and numerical variables
  • Is on a topic that genuinely interests you
  • Has documentation about data collection methods and meanings of variables
  • Is reasonably clean but offers some opportunities for data cleaning practice
  • Can be easily imported into CODAP
  • Supports at least three meaningful statistical questions
  • Contains variables that might have interesting relationships
  • Is publicly available or properly licensed for educational use
Review the dataset you’re considering and evaluate how many of these criteria it meets. A good dataset for your project should satisfy at least 7-8 of these items.
Solution.
This checklist serves as a reference for dataset evaluation. There is no single correct answer, as the suitability of a dataset depends on your specific project needs. However, datasets meeting more criteria will generally provide better opportunities for meaningful analysis.