This talk was given on September 24, 2019 in the Stata Center at MIT as part of HCI seminar series. Catherine D’Ignazio is a Center for Civic Media alum, and will be coming back to MIT in January 2020 as an assistant professor of Urban Science and Planning in the Department of Urban Studies and Planning. This talk was a presentation on her forthcoming book Data Feminism, coauthored with Georgia Tech’s Lauren Klein. It will be out in March 2020 on MIT Press.
This talk was transcribed by Mike Sugarman. Anna Woorim Chung helped organize the post. Any inaccuracies, mistranscriptions, glaring omissions, and misspellings are probably ours (especially Mike’s).
Data Feminism the book comes from a simple motivation. In the popular press and TED talks, data has been heralded as “new oil,” but there’s been pushback on the idea that data and tech are all for the good. Some of the most significant pushback led by women, POC, indigeonus people, LGBTQ people in academic papers, journalism, etc.: nothing new about impacts of this, it’s the same old oppression and social stratification.
References Safia Noble, Virginia Eubanks, ProPublica on Risk Assessment algos, (see more in photo of slide)
Wanted to ask what a feminist approach to data science looks like. Can we refuse to use data and computation that upholds structural oppression, from the perspective or practice. Is interested in building tools for and with communities, how can we do that in ways that are anti-oppressive from the get-go.
D’Ignazio comes from art, design, development. Identifies as a woman in tech and “hacker mama.” Part of “Make the Breast Pump Not Suck” Hackathon. White, cis, hetero academic, and Lauren shares those dominant identities with me.
D’Ignazio says Lauren Klein coming out of history, works a lot in digital humanities. Machine learning to prepare documents about food prepared in Thomas Jefferson’s house. Self identified “professional nerd.”
One of their principles is embracing pluralism, so published draft of book online to get comments. Have been doing revisions for the past year. Who is it for? Academic book, but accessible to students, newcomers (to feminism, data science), practitioners.
What do we mean by “Feminism”?
Beyonce quote: “Feminist: the person who believes in equal rights for men, women, non-binary people.”
Merriam Webster: Feminism, the theory of the political, economic, and social equality of sexes.
Feminism began by thinking of inequality in respect to sex and gender, but 40 years have brought many more dimensions of inequality into conversation: race, sex, class, and so on. D’Ignazio’s and Klein’s idea comes out of black feminism in US: intersectional feminism. Looking at women, gender inequality, power, who has it, who doesn’t.
“Intersectional feminism” coined by Kimberlé Crenshaw: inequality can only be understood through gender plus other issues (such as racism, inequality) Notes Combahee River Collective originally from Boston.
Audience takes a break for a 2-minute peer share around the question, “Do you identify as a feminist?”
There’s been a lot of contamination of the word feminism, deliberately. Feminism has exercised a lot of exclusion, and the history of feminism is a history of exclusion. Intersectionalism comes from saying white feminism isn’t serving women of color.
How do we bring feminism to data and data science? What feminism is good at doing is asking who questions, which are really important. Bringing up “Who” is being a pesky person who’s asking people to have uncomfortable conversations. Data science…
- About whom
- By whom
- Serving whose interests
- With whose values
7 principles of data feminism:
- Examine power
- Challenge power
- Elevate emotion + embodiment
- Rethink binaries and hierarchies
- Embrace pluralism
- Consider context
- Make labor visible
Based in human-computer interaction, critical cartography and GIS, science and technology studies
First 2 points: Analysis of power is so central to the feminist project. Examining power and challenging power, data feminism evaluates how power operates and how power structures contribute to injustice. The belief in equality remains unrealized project. World is not working properly and we want to work towards justice.
“Elevate emotion and embodiment” relates to lived experience. Emotions as valid way of knowing and responding to world
“Rethinking binaries and hierarchies” Challenge gender binary. There are not just 2 genders, also challenging other classifications systems that lead to oppression.
“Embrace pluralism” Multiple perspective, priority given to local and indigeonous perspective, emotional ways of knowing.
“Consider context” Data never objective, but product of unequal social relationships, context essential to understand if we’re going to conduct accurate and ethical analysis.
“Make labor visible” all work is the result of many hands, valuing all those who contribute to work.
D’Ignazio shows redlining map of Detroit
In 1940s, Federal Home Loan Bank collaborated with cities assessing risk of owning homes in certain parts of cities. All of Detroit’s black neighborhoods fall in the red area (similar in other cities where this happened). Classic example of bad thing to do. A way that the most advanced tech (cartographic, surveying) of time were deployed in the service of securing wealth of white owner class. Who is allocating funds to make maps really matters. In book: really great historical example of how big data and technology of its time was deployed to really disastrous ends. Impact of disinvesting was so vast that we still see effect in cities today.
Maps made by people in power look objective. Ruha Benjamin (is talking 9/25 at Harvard and Cambridge Public Library on the 9/26) has idea of “imagined objectivity,” which is high-tech, view from above, authoritative – claims to be objective. But actually (during redlining process) white guy bureaucrats walked around neighborhoods making assessments very subjectively. Not scientific, but result speaks authoritatively using language of objectivity
Question from participant: If it was objective would have been ok?
Response from D’Ignazio: Who’s in power? Who is doing the sorting, putting people in these different buckets? We need to step back and question ideas like a situation of scarcity where people can’t have homes. Why do white men in power get to decide these things, why do they get to make these maps? It’s flawed, but even more flawed by subjectivity.
What happens when the perspective has shifted? What if it’s the folks who are in community themselves. Maps start to look different. Fast forward 30 years, this is a map from Detroit made by Gwendolyn Warren of where commuters ran over black children along Pointe-Downtown railroad tracks. Everyone knew it was happening, but gathering the data was a huge hurdle. Had to develop relationships with city, police department to get info on what time, where, and who killed the child. Data was collected and published because of collaboration between black youth lead by Warren and white academics. Idea of examining and challenging power raises idea of perspective. Refer back to 4 values of data feminism.
D’Ignazio and Klein think data science and AI have a lot of promise. Getting insights from large, complex data sets are useful for seeing forces of oppression, can make power imbalance visible. Must be in hands of the right people, not blind to privilege and power imbalances. Project by artist Mimi Onuoha, going out to look for data but doesn’t exist.
Mimi goes to collect missing data sets like trans people killed in hate crime, people closed out of public housing because of criminal records. Refer to ProPublica article on how we don’t know how many black women are dying of result of childbirth and postpartum complications in US. When it comes to women, gender non-conforming people, women of color – missing data sets are norm rather than other way around. It’s a pattern where we fail to collect data on bodies other than default.
Another that looks to challenge power: Femicides data collection in Mexico by María Salguero. Individual person collecting info on gender-based killings in Mexico. We don’t use this word in English, often we call it intimate partner violence or domestic abuse. There is a legal definition of femicide in Mexico, subject of emerging public anger in Latin America. “Not one less,” movement stemming from anger over inaction at state level. Salguero has a job, then spends 2 to 4 hours a day logging femicides on Google map she culls from media reports. Her 4-year-old data has become most authoritative open data source. Journalists, Mexican Congress, families of missing people come to her. Feminist data actions steps in in place of state inaction, when state has failed to ensure basic safety. One way of using data to challenge power.
Also, strategy amongst journalistic organizations. Higher level projects by Washing Post and The Guardian to count police killings since we don’t have comprehensive federal stats. Who steps into that vacuum is a pressing issue of public concern.
Silly idea that “data speaks for itself.” We should never let data speak for itself, ever. Tells of projects her data journalism students did. They set out to do report of sexual assault on college campuses. Found “Clery Act” report from fed government. Found that Williams College has very high rate of sexual assault while Boston University has extremely low rate. Truth of the matter is likely closer to the opposite. Wiliams likely much better and BU doing worse, arrived at this through interviews that helped them understand how power imbalances shaped numbers being reported.
Numbers are self-reported. Universities and colleges report these to federal government, and federal government has few resources to verify. No Uni wants high rate of sexual assault, so no reason to report. Parents are main clients and don’t want to see high rates of sexual assault. Secondly and really importantly, it’s really hard for survivors of sexual assault to come forward and report: they are shamed, blamed, retraumatized in process. Ways universities can incentivize or disincentivize survivors to come forward. Williams invested a lot of money to create climate for survivors to come forward. BU was devoting very little, hence low numbers since no one would report.
Misaligned incentives in collection environment, that, if taken at face value, would lead to false sensationalistic story. Williams is getting closer to actual incidence in population. Considering context helps account for power differentials in environments. Data set, but also data setting.
Principle of elevating emotion and embodiment
Graphics of guns and gun violence. One from Periscopic and one from the Washington Post. Periscopic one is animated and frames years stolen from people. When someone dies from being shot, line turns white, shows how long that person would have lived if not killed by a gun. Emotional framing around loss, these years have been stolen from them. Widely discussed in data visualization community because of reigning idea that we should strive for visualization that’s “neutral”. Edward Tufte calls for “minimalism,” with dry title and design. This counteracts that. In the book we make a case for the idea that emotion matters, grabs our attention. Emotional framing is not just ok, but responsibility of people doing design to provide most truthful framing around data. The issue is emotional. Rather than saying all forms of persuasion are bad, we’re saying that the visualization is quite persuasive when neutral too (refer to Washington Post graphic). Kelly Dobson uses term “data visceralization.” What kind of creative possibilities opened up when data visceralization opened up?
Project by group DataZetu in Tanzania. Ran community design competition for communication around gender-based violence. Adopted winning design, created entire fashion show with clothing that communicates statistics. Uses bodies, social context, joy of coming together around justice vision. How do we embrace fullness in data visualization?
Thinking about idea of whose voices matter in system/data design process. Who do you consult with? Feminist perspective says do a power analysis in data setting and say whoever is most marginalized or has most potential to be harmed in system should be voices in center, ideally leading process. Shoutout to Design Justice group.
Comparison of Anti-Eviction Mapping Project (community mapping project based in San Francisco who are housing activists working together with tenant’s rights orgs to document eviction crisis in SF through maps and narrative data collection (stories of being evicted). Very messy process based work lead primarily by women and people of color. Compare to work at Eviction Lab at Princeton. Less concerned with process. Our job is to quantify national scope of eviction crisis. Started out by trying to work with community orgs, but it’s messy, so instead opted to purchase data sets of lower quality but captured national scope. Anti-eviction mapping project actually about welcoming different voices into process and building relational infrastructure on the ground.
Is more data always better when it comes at the expense of higher quality? Princeton has prioritized speed and quantity, Anti-Eviction Lab has valued quality over quantity. The process slows down when we value that plurality. We say there’s something super useful in slowness and process.
Data Feminism is:
- Data science that exposes and challenges intersecting oppressions
- Data science by and centering minoritized people
- Counter data science about injustices created by mainstream data science
- Always includes gender
Question 1: How does data fem perspective influence the kind of tools you build?
Answer 1: Excited to return to MIT to explore questions that relate to the tech. The book addresses all the context, but there are meaningful ways to build feminist oriented tools or center certain people in design process. As tool builders we have a lot of power to shape voices of people we imagine using tools. We advocate for co-design process. What does that mean for making tool? Deep consultation and relationship building with people you imagine to be future users. You’re going to make a better tool. Can develop tool in vacuum and see what happens. Embedding oneself in context to imagine people and context for using tools. Situatedness vs. generalization and abstraction. Abstraction is one of the core tensions with data feminism, which looks to work with user groups and co-design. I don’t know if tension is resolvable
Question 2a : Limits on centering data in discussion vs. other matters like design (making data object of investigation)?
Answer 2a: Allows us to use a helpful construct to expand out and look at issues that come before the data
Question 2b: What questions are being asked, what structures precede this, what comes after with people using datasets without awareness of context and intended uses?
Answer 2b: We’re trying to enlarge that conversation. One of the dissatisfactions I’ve had with convos around fairness/machine learning, ethics/data is they remain enclosed in technology. There are a good amount of issues within the technology and experiments within them, but they won’t solve all the problems. Joy Buolamwini in the Media Lab has done a lot of work showing how lack of diversity in training sets leads to models performing poorly on women of color. One response: diversity of faces dataset. Another: Chinese government has made pact with Zimbabwe to install surveillance cameras to capture data to build a data set of dark faces. A case where just pointing out lack of diversity in training sets doesn’t mean we can make a better world. Now we’re just surveilling people better. Not just talk about models, but everything that comes before that. It includes all of these things. How do projects get funded? Who’s leading? Who’s asking questions? Your ability to operate and make ethical changes is limited once you go further and further down the pipeline. We have a great facial recognition set with gov’t, how do you make that ethical. You can’t.