Not all questions are statistical questions. A statistical question is one that can be answered by collecting data and where we expect variability in that data.
Definition35.
A statistical question is a question that anticipates variability in the data related to it and can be answered by analyzing data. It usually addresses patterns, trends, or relationships in a group or population rather than specific individuals.
Example36.Statistical vs. Non-Statistical Questions.
Non-statistical questions typically have a single, deterministic answer:
How tall is Jamal? (Asks about a specific individual)
What is the capital of France? (Has a definitive answer)
Did it rain yesterday? (Yes/no factual question)
Statistical questions anticipate variability and focus on distributions:
How tall are 7th-grade students at Lincoln Middle School? (Expects variation in heights)
What is the relationship between a country’s GDP and its literacy rate? (Examines patterns across countries)
How does rainfall vary by month in Seattle? (Looks at distribution over time)
Checkpoint37.Identifying Statistical Questions.
Which of the following is a statistical question?
What is the temperature right now?
This is not a statistical question because it asks for a single measurement at a specific point in time, not a distribution or pattern.
How many siblings does Maria have?
This is not a statistical question because it asks about a specific individual and has a single answer.
How does commute time vary among employees at a company?
Correct! This is a statistical question because it anticipates variability (different commute times) and requires analyzing data from a group.
Is 15 minutes a long time to wait for a bus?
This is not a statistical question because it asks for a subjective judgment rather than something that can be answered directly with data.
Good statistical questions are the foundation of effective data investigations. They guide the entire process, from data collection to analysis and interpretation.
Effective statistical questions have several key characteristics:
Clear and specific: Precisely defines what you want to know and what population you’re studying
Answerable with data: Can be investigated through data collection and analysis
Anticipates variability: Expects a distribution of values rather than a single answer
Meaningful: Addresses something worth investigating and has potential implications
Neutral: Doesn’t presuppose a particular answer or bias the investigation
Example38.Refining Statistical Questions.
Consider how we might refine these initial questions to make them more effective:
Initial: Is air pollution bad for health?
Refined: What is the relationship between average annual air quality index (AQI) and asthma hospitalization rates across neighborhoods in our city over the past five years?
Initial: Do parks make neighborhoods healthier?
Refined: How does the percentage of green space in a neighborhood correlate with residents’ self-reported physical activity levels, controlling for median income?
Initial: Which neighborhood has the worst environmental health?
Refined: How do neighborhoods compare across multiple environmental health indicators (air quality, water quality, access to green space, and proximity to pollution sources), and what patterns emerge when considering demographic factors?
Notice how the refined questions are more specific about what is being measured, the population being studied, and the relationships being investigated. They also avoid loaded terms like "bad" or "worst" that might bias the investigation.
Activity11.Refining Your Questions.
In this activity, you’ll work on developing effective statistical questions for your project.
(a)
Review the statistical questions you brainstormed in the previous activity. Select one that you think has the most potential.
(b)
Refine your selected question using the characteristics of effective statistical questions. Make it clearer, more specific, and more answerable with data.
(c)
Share your refined question with a classmate and provide feedback on each other’s questions.
Checkpoint39.Improving Statistical Questions.
For each initial question below, select the most improved version that follows the principles of effective statistical questions.
Question 1: Initial question: Are students getting enough sleep?
Select the most improved version of this question:
a. Should students get more sleep?
b. What is the distribution of nightly sleep duration among high school students, and how does it compare to recommended amounts for adolescents?
c. Why don’t students get enough sleep?
d. Who sleeps the most in the sophomore class?
Question 2: Initial question: Does income affect health?
Select the most improved version of this question:
a. Is it fair that rich people are healthier?
b. What is John’s income and health status?
c. How do rates of chronic diseases vary across different income brackets in the United States, and has this relationship changed over the past decade?
d. Why do poor people have worse health outcomes?
Hint.
Think about what makes a good statistical question: it should be specific, measurable, and free from assumptions or value judgments.
Answer1.
\(\text{b}\)
Answer2.
\(\text{c}\)
Solution.
Question 1: The best improved version is: “What is the distribution of nightly sleep duration among high school students, and how does it compare to recommended amounts for adolescents?”
This question is specific about what is being measured (sleep duration), the population (high school students), and includes a comparison to a standard.
Question 2: The best improved version is: “How do rates of chronic diseases vary across different income brackets in the United States, and has this relationship changed over the past decade?”
This question is specific about the variables (chronic disease rates, income brackets), the population (United States), and adds a time dimension for additional insight.