Section Ethics Spotlight: Representation in Data
As we plan our data investigations, it’s crucial to consider who is represented in our data and who might be missing or underrepresented.
Key ethical considerations regarding representation include:
Selection bias: Does our data systematically exclude certain groups?
Sampling fairness: Does our sample adequately represent diverse populations?
Historical exclusion: Are we working with data that reflects historical patterns of exclusion?
Appropriate categorization: Do our categories respect how people identify themselves?
Contextual interpretation: Are we considering social and historical context when interpreting group differences?
Example 48. Representation in Community Health Data.
In our Community Health dataset, we might need to consider:
Whether health surveys reached residents who don’t speak English
If environmental monitoring stations are distributed equitably across neighborhoods
Whether certain communities have historically been excluded from public health research
If the neighborhood boundaries used in our analysis reflect meaningful community divisions
How to interpret health disparities without reinforcing harmful stereotypes
Checkpoint 49. Data Representation Scenarios.
For each scenario, identify the primary ethical concern related to representation in data.
Activity 14. Evaluating Representation in Your Dataset.
In this activity, you’ll critically examine representation issues in your chosen dataset.
(a)
Identify at least three ways in which your dataset might not fully represent the population you’re interested in studying.
(b)
Consider how these representation issues might affect the conclusions you can draw from your analysis.
(c)
Propose at least two strategies for acknowledging or addressing these representation issues in your investigation.