Skip to main content

Section The Data Investigation Framework

Subsection Overview of the Framework

Hollylynne Lee and her colleagues at North Carolina State University have developed a comprehensive framework for data investigations in their paper Investigating Data Like a Scientist: Key Practices and Processes
 12 
https://iase-pub.org/ojs/SERJ/article/view/41/457
that helps structure the process of working with data. This framework, seen below, provides a roadmap that can guide us through the complexity of real-world data analysis.
Team at NCSU’s graphic for the data investigation process framework.
HIRISE Project, NC State University
 13 
hirise.fi.ncsu.edu/projects-activities/data-investigations/
Figure 31. The Data Investigation Cycle (based on Lee et al.)
The framework consists of four interconnected phases:
Ask Questions
Formulate statistical questions that can be answered with data. These questions should anticipate variability and focus on distributions rather than individual cases.
Consider Data
Evaluate available data sources, collection methods, and variables. Consider what additional data might be needed and assess the quality and appropriateness of the data for answering your questions.
Analyze Data
Apply appropriate techniques to organize, summarize, and visualize the data. This includes cleaning the data, creating meaningful representations, and identifying patterns.
Interpret Results
Make claims based on the evidence from your analysis, acknowledge limitations, and consider implications. This often leads to new questions, continuing the cycle.
While the framework is presented as a cycle, real investigations rarely follows that circle perfectly. You might need to revisit earlier phases as you gain insights or encounter challenges. Iteration is essential in data analysis.

Checkpoint 32. Data Investigation Phases.

    During which phase of the data investigation framework would you typically clean the dataset by handling missing values?
  • Ask Questions
  • The Ask Questions phase focuses on formulating statistical questions, not on cleaning data.
  • Consider Data
  • While you might identify data quality issues during the Consider Data phase, actual cleaning typically happens during analysis.
  • Analyze Data
  • Correct! Data cleaning is part of the Analyze Data phase, where you prepare and transform the data for meaningful analysis.
  • Interpret Results
  • The Interpret Results phase focuses on drawing conclusions from the analysis, not on cleaning the data.

Subsection The Framework in Practice

Let’s see how this framework might apply to our Community Health and Environment project:

Example 33. Community Health Investigation.

Ask Questions: How does air quality relate to asthma rates across different neighborhoods? Do these relationships differ based on income levels?
Consider Data: Our dataset includes air quality measurements, asthma prevalence rates, and median household income for various neighborhoods. We need to consider how air quality was measured, whether measurements were taken consistently, and if there are confounding variables we should account for.
Analyze Data: We might create scatterplots of air quality versus asthma rates, calculate correlation coefficients, and group neighborhoods by income levels to compare patterns. We’d also need to handle any missing values or outliers.
Interpret Results: Based on our analysis, we might find that neighborhoods with poorer air quality tend to have higher asthma rates, and this relationship could be stronger in lower-income areas. We’d need to acknowledge limitations (correlation doesn’t imply causation) and consider implications for public health policy.
The framework helps ensure that we approach data investigations systematically, considering crucial aspects at each stage. It also highlights the iterative nature of data analysis—insights from later phases often prompt us to refine our questions or seek additional data.

Activity 10. Applying the Framework.

In this activity, you’ll apply the data investigation framework to your chosen project dataset.
(a)
For your chosen dataset, brainstorm at least three potential statistical questions you could investigate.
(b)
For one of your questions, outline what you would do in each phase of the data investigation framework.
(c)
Identify at least one challenge you might face in each phase and how you might address it.

Checkpoint 34. Framework Activities.