What Does Feminist Data Science Look Like? – Catherine D’Ignazio and Lauren Klein

Catherine D’Ignazio and Lauren Klein joined us at the Digital Democracies Institute to close our 2021 fall speaker series. They came to share some highlights from their co-authored open source book Data Feminism (MIT Press, 2020), and some adjacent ongoing projects. The premise of their book is that “in today’s world, data is power”. Its power is wielded unequally, where a “small and homogenous group of corporations and institutions design and deploy technology for their own profit at the expense of everyone else.”

Lauren explained that “feminism is also about power. It is the belief in equal rights for all genders, taking political action to realize that belief, and pursuing an intellectual heritage that helps us understand and challenge unequal power.” She also noted that their intersectional feminist project is not only about women or gender. It came from the idea put forward by women of colour, predominantly Black women, that addresses who has power and who does not. It recognizes that we cannot isolate certain forms of power and privilege, and their effects are compounded. Of course, she explained, these power imbalances are not individual markers of identity, but structural forces that produce effects on our individual experiences.

In their book, Catherine and Lauren use teachings from intersectional feminism and critical thought to develop seven principles to achieve a more equitable data science. They include:

  • Examine power
  • Challenge power
  • Rethink binaries and hierarchies
  • Elevate emotion and embodiment
  • Embrace pluralism
  • Consider context
  • Make labour visible

Each one corresponds to a chapter in the book. Within each they ask questions like: “What does this look like?”, “Who is doing this?”, and “How can this be done?” to operationalize feminism for data science, both for people who want to work with data, or refuse to work with data (as refusal is a powerful feminist position).

Catherine shared an ongoing project she is involved in and a case study from the book of data collection around feminicide, or gender related killings of both cis and trans women and girls, in Mexico. She highlighted María Salguero’s Google Map of femicides as the largest dataset of its kind in Mexico. She has used this database to work with families to locate loved ones, helped journalists pursue leads, and to testify in front of Mexico’s congress. Catherine noted that, “feminicide is legally defined as a crime, but the state does not systematically collect data about it– a source of anger among activists and the public. One of their demands is to understand the problem.” In response, María built a network towards gathering feminist counter data, which is “activist data collection that steps in when the state or institutions have systematically failed to ensure basic safety of its population.”

Catherine provided two caveats to this example. First, that data collection is not the only way to challenge power. Sometimes it is advisable, but other times it is not. “There are many situations where collecting further data would expose marginalised people to more visibility to institutions that want to target them. Data collection can be a powerful strategy in certain circumstances.” Secondly, feminicide is not only a problem in Latin America, and there are groups doing this in multiple places. For example, in the United States, Sovereign Bodies Institute for Missing and Murdered Indigenous Women, Girls and Two-Sprirt (MMIWG2) is also using counter data collection in a similar way. There are several more examples of counter data being used to draw attention to disporportaionate violence against women, queer communities, Indigenous peoples, and more.

In their work with counter data against feminicide, Catherine and her collaborators, Silvana Fumega and Helena Suárez Val, developed three goals:

  • Understand why and how activists are collecting data about gender-based violence and feminicide;
  • Use the information gathered about why and how to work with activists to co-design digital tools to support and sustain their projects. This work is often underpaid, or completely unpaid;
  • Build a community of practice for people working on feminicide and data to bring people together. For example, now they are working on a data standard for sharing data. This goal was set based on the needs of people that were interviewed, in particular.

Based on these goals, developing a participatory process for feminist data science with their team became crucial. “There has been little done on building a participatory framework in data science, in contrast with other disciplines, such as urban planning,” Catherine noted. They began with embracing pluralism in data science and AI by using Donna Haraway’s notion that “we are strongest when we pool our perspectives.” Of course everyone’s voices cannot be brought in, so whose are prioritized? “According to scholars like Sandra Harding and Shaowen Bardzell, feminist participatory design “centres and prioritizes people at the margins first, and makes decisions about design from the outside in. This is the perspective for the data feminism project”, Catherine explained. 

In the convergence of activist and civil society groups already doing this work, they asked, what tools can they build using this participatory framework to help sustain the movement? From their interviews and co-design work, they developed two digital tools. One is a data highlighter that highlights specific names, places and dates in news media to support people scanning the media for this information. The other is an email alert tool similar to Google Alerts that scans media articles about an area of interest, finds possibly relevant articles, runs through Machine Learning, and sends highest ranked articles to activists by email. Catherirne noted the lessons they learned from developing this tool. They needed to build separate classifiers based on different training sets for groups such as Black women killed in police violence, or for MMIWG2, for instance. “This is because there are different types of media bias as intersectional power permeates the media. Broad categories such as feminicide can obscure more specific types of violence”, she explained. Catherine also reiterated that the goal of such tools is not to automate activist labour. “The goal is to support and sustain existing labour, as witnessing is a central part of this work, as a form of memory justice”, she said.

In relation to activist labour, Lauren pointed to two of her projects in progress. In them she uses the feminist concept of invisbile labour, with the understanding that it is invisible because it is 1) hidden from view as it takes place in the home and 2) unpaid and thus unseen by the marketplace. On the spectrum of visible and invisible labour, she uses data to explore questions of labour and power in a series of projects that make use of data and data visualization. One focus is on the abolitionist movement of the 19th Century in the United States. She notes that, “like how Catherine reworks data collection and processing using feminist methods that honour physical and emotional labour, this project uses methods of data analysis that retroactively try to honour some of these forms of labours from the past that come to us, in this case through text.” She also contextualised this research within the academy today, saying, “abolition was one of the most important social movements in the US or in the world. Currently research on this is going through a bit of a renaissance. People are reexamining how the end of slavery was accomplished– through a multi-racial coalition of men and women who didn’t always agree on the means to accomplish this goal. Do you move to the centre? Is there room for moderation when it comes to human freedom? These questions strongly resonate today.”

Lauren introduced the computational model she and her co-authors Sandeep Soni and Jacob Eisenstein built to track changes of meanings of words, or word embeddings, to determine which newspapers were changing the meaning of concepts like freedom and justice during the abolition era as they went from the margins to the mainstream. Lauren told this story through the historical figure Mary Ann Shadd, a Black woman from the mid-Atlantic who moved to what is now Ontario. She was an editor of The Provincial Freeman, and wrote openly about how even among her cohort of women editors she had to work much harder than her white counterparts. Lauren stated that their “analysis affirmed what was already known– that Mary Ann Shadd innovated word meanings in greater proportions than other newspapers. She was doing something more radical, innovating at the level of discourse and influencing how people spoke.” Lastly, Lauren explained another newspaper, The Lily, a white women’s suffragette newspaper contained racist, anti-Black claims. She asked, “what does it mean when we have The Lily confirmed as a conceptual innovator? Discourse in movements shaped by a range of belief systems is not always in agreement or around a common cause. We must recognize this complexity, and not sweep parts of anti-Blackness under the rug.”

In closing, Catherine explained that principes of data feminism apply to every stage of the research process, from inception and funding to production, circulation and impact in the world. She said, “data feminism requires an expanded definition of data science that cannot be defined by the size data set or technical credentials by people doing work. This original framing has been used to exclude women and people of colour from the field, and contributions that are not purely technical but socio-technical.” She noted that there are many examples to be found in the book of what happens when we expand this definition to care about justice. Some of it does look like traditional data science, but other work takes the form of interactive sculpture or data driven murals done in collaboration with community organizations. “Currently, data is a root of many of our problems today, but it can also be part of the solution,” Catherine said. At the DDI, we look forward to integrating feminist data science methods into our work and continuing to follow Catherine and Lauren’s innovative projects.

The open source version of Data Feminism can be found at datafeminism.io