Section Ethics Spotlight: Ethical Data Collection
Before we conclude Unit 1, it’s important to consider the ethical dimensions of data science. The data we collect and how we collect it has real impacts on people’s lives.
Key ethical considerations in data collection include:
Consent: Were people informed about how their data would be used?
Privacy: Is sensitive information protected?
Representativeness: Does the data collection process exclude certain groups?
Transparency: Is the collection process clear and documented?
Minimization: Is only necessary data collected?
Example 28.
In our Community Health dataset, consider these ethical questions:
Are the neighborhoods defined in ways that might reinforce historical segregation?
Does the dataset include communities that are often underrepresented in research?
Is the aggregation level appropriate to protect individual privacy while still being useful?
Were community members involved in deciding what data to collect about their neighborhoods?
Activity 9. Data Ethics Discussion.
In this activity, we’ll discuss ethical considerations for our datasets.
(a)
For the Community Health dataset, identify at least two potential ethical concerns and how you might address them.
(b)
For your chosen project dataset, consider:
What was the original purpose of this data collection?
Who collected it and how?
Who might be missing from or underrepresented in this dataset?
Are there privacy concerns with this data?