How to Identify Gender in Datasets at Large Scales, Ethically and Responsibly | MIT Center for Civic Media

How to Identify Gender in Datasets at Large Scales, Ethically and Responsibly

A practical guide to methods and ethics of gender identification

For the past three years, I've been using methods to identify gender in large datasets to support research, design, and data journalism, supported by the Knight Foundation, with an amazing group of collaborators. In my Master's thesis, used these techniques to support inclusion of women in citizen journalism, the news, and collective aciton online. Last February, I was invited to give a talk about my work at the MIT Symposium on Gender and Technology, hosted by the MIT Program in Women's and Gender Studies. I have finally written the first part of the talk, a practical guide to methods and ethics of gender identification approaches.

If you just want to get started analyzing gender in your dataset, I suggest the Open Gender Tracker libraries for Ruby (by Jeremy Merrill), Python (by Marcos Vanetta), and R (by Adam Hyland and Irene Ros). To find out why, read on.

Other posts in this loose series (not all by me) include:

Why Do Gender Metrics Matter?

This June, the feminist hackerspace Double Union launched one of my favorite diversity data websites: (code on github). The site shows a list of tech companies who have released information about the demographics of their employees, inviting viewers to thank them or pressure them based on what they have released. The site reminds us that:

"Open diversity data will make it easier for everyone to better understand the diversity landscape and work toward solutions."

Tech companies have faced substantial pressure this year for their need to improve inclusion. Social justice advocates and professional groups have long advocated for diversity in institutions, whether it's groups like Catalyst arguing for women on boards, the American Society of Newspaper Editors working towards demographic parity in the news, or MIT's gender equity project working to foster inclusion in academia. In each case, metrics are a critical bedrock of change, revealing areas of improvement and tracking progress.

Online, where collective action isn't fully controlled by institutions, institutional policies for inclusion are less powerful. That's why Yonatan, Matt, and I created Tally for the mentorship organization Gender Avenger, who use crowdsourced metrics on panel speakers to support conference organizers (see Ashe Dryden's tips for organizers). Emma Pierson used a gorgeous data analysis to prove that women in competitive debate are under-scored, and that difference in experiences don't fully explain the gender gap in scores. Emma has also studied ways that men dominate New York Times comments, analyzing almost a million comments to look at women's participation. By analyzing data, Emma also found patterns where women were welcomed even in cases where they were the minority. Data on open source communities (pdf, and here) and on gender in Wikipedia (here, here, here) offer ongoing insight on evolving disparities and differences in online platforms.

Data can also support real-time systems for diverse participation. The Conversation Clock (pdf) by Karrie Karahalios (see also Visiphone) and Tony Bergstrom offers visual feedback to interrupters (often men) to remind people in a conversation to listen. The FollowBias project that Sarah Szalavitz and I created does something similar for social media, helping users monitor and adjust the diversity of who they pay attention to. I've also prototyped a "gender check" for text editors that allows writers to monitor the diversity of their content before they publish (something I've just learned was also tried by German computer scientists in 2004).

Techniques for Collecting Large-Scale Gender Data

Ask People their Gender and Sexuality On a Form

The simplest way to collect gender data is to ask people. Facebook, for example, asks people their sex, their gender expression (learn more about the distinction), and who they are interested in, even though only some of that information is available through their API for demographic targeting by third parties (custom gender pronouns are hidden from advertisers).

Although the best method involves asking people to self-identify and choose how to be represented in your data, this option is usually only available to companies, online platforms, or conference organizers who think in advance about diversity. If you're lucky enough to be able to collect gender information about the group you want to know more about, read the Williams Institute's Sept 2014 report on Best Practices for Asking Questions to Identify Transgender and Other Gender Minority Respondents on Population-Based Surveys. The report includes perspectives from over a dozen rights organizations, health professionals, and researchers. The Human Rights Campaign has also published a guide to collecting transgender inclusive data (more about HRC's categories). If you're focused on interactive forms, CMU PhD student Chris Martens has pointed out Sarah Dopp's post "designing a better drop-down post for gender." 

If you don't make gender a required field, or if you issue an opt-in survey, you should expect your results to skew male: opt-in surveys tend to under-count women. Last year, research by Mako Hill and Aaron Shaw demonstrated that on Wikipedia, "the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%)." They have published source code to help web platforms weight their survey results based on readership demographics.

Ask People to Guess Someone Else's Gender

Most often, we're analyzing datasets without gender data. News publishers, for example, do often keep information about the gender of their journalists (for the ASNE newsroom census), but they don't release gender information on individual journalists. For decades, advocacy groups have relied on people to guess the gender of journalists from their names. The Global Media Monitoring Project examines a sample of journalism in over a hundred countries, asking volunteers to identify the likely gender of contributors. By asking more than one person to look at each name, the GMMP uses inter-coder reliability measures to ensure higher quality results. Other groups that use this method include VIDA Women in Literary Arts, Op Ed Project, and the UK's Women in Journalism have used similar methods in the past as well.

The quality of name-based guessing can be enhanced with photo-based guessing, a method I used in my 2012 work with the Guardian, and which Derek Ruth's team at McGill has systematized and evaluated extensively(pdf). With this method, Google image searches or Twitter profiles are shown to people, who determine whether that person presents male, female, or unknown. Here's what volunteers see in one Google Spreadsheets system I've developed: for each name+organization combination, coders click on a Google image search link and enter in their judgment of the person's gender, if they see more than two images of the same person.

Sometimes people are in the room. The Tally app that I prototyped for Gender Avenger, which was turned into production quality software by Yonatan Kogan and Matt Stempeck, relies on participants at a conference to count the people on a panel and enter information into a mobile website.

Is any of this ethical? After all, we're asking people to make judgments about other people's identity based on their names or physical appearance. It's also difficult to account for queer identities with this method. Faced with this difficulty, activists and researchers tend to respond by aknowledging the limitations of these methods, avoidng claims about individuals, and fitting their work within broader efforts on inclusion and social justice.

Automatically Guess Gender from Names

Surveys and human coding will never be able to function in real-time or at scale (the Global Media Monitoring project took 5 years to analyze 16,000 media items). To do that, we turn to automated methods. The simplest approach is to use historical birth records to estimate the likely sex of a first name. My colleagues at Bocoup and I, who were funded by the Knight Foundation to create Open Gender Tracker, have written about these methods extensively. Here are some of the best places to learn more about this method:

When using one of these systems, it is critical to know as much as possible about the source of the names and the accuracy of a given dataset for a particular population. In our global names dataset, we've observed large differences between UK and US names. Some libraries, which don't document the source of their names, could be offering highly inaccurate results at high levels of confidence. Like the GendRE api, a commercial product, Open Gender Tracker allows you to specify the region of your study to achieve greater accuracy.

Other similar libraries include Lincoln Mullen's gender R package (which has a less comprehensive but well documented dataset than Open Gender Tracker) and The Sex Machine python package, whose data source is not well documented.

Combine Automated Methods with Human Judgment

For publishable research, I always encourage a combination of automated methods with human judgment. In this approach, we use Open Gender Tracker to offer inferred sex for as many people as possible. We then ask humans to guess gender from photos for a sample of all names, in order to identify how accurate the Open Gender Tracker is for that particular set of names. In some cases where individual-level accuracy is needed, we optimize the cost of human coding by asking volunteers or Turkers if they disagree with the automated system's judgment, for a very large sample or potentially the whole dataset.

Combining Automated, Human Judgment, and Self-Representation

The most flexible and fair approach would support large-scale analysis while also inviting people to choose how their sex and gender presentation will be stored in the system. Together with the MediaCloud team and some advice from mySociety, we're adapting the PopIt system to publish information about the demographics of public figures. Where possible, an automated algorithm will offer its judgment. We will ask volunteers to offer their judgment. Finally, we will invite the person to check in to the system themselves, to correct their inferred gender or to adjust their privacy in our research.

It's rare for online services to offer non-binary gender identites, something that may change with Facebook's recent update to gender identity and pronoun options. Our plan with PopIt is to offer multiple levels of confidence and privacy in datasets where information on queer identies may support ethically-designed research. With that data, we should be able to expand research on diversity and inclusion to extend well beyond gender binaries.

Inferring Gender from Content

In some cases, it's possible to infer gender from content. In the first kind of research, used by Joseph Reagle and Lauren Rhue in their study of Wikipedia biographies, gendered pronouns are used to detect articles about women and men. They used this method to compare Wikipedia's coverage of women to other encyclopedias, including Britannica. Sophie Diehl and I also used this in our project to link New York Times obituaries to Wikipedia articles, identifying the likely gender of obituaries to a very high accuracy.

Occasionally, languages include features that identify the gender of the speaker or the object of a comment. McGill undergrad Morgane Ciot, along with Derek Ruths, did a fascinating study that successfully used this method to detect the gender of Twitter accounts (paper pdf here).

A third area of research attempts to identify male and female writers based on the style of their language. Research on novels and articles suggested that this might be the case. More recent work on Twitter has also suggested that Twitter account gender may be identifiable by their patterns of language and who they follow. They also noticed that not everyone follows typical gender norms, showing that across different topics, detectable differences often "defy population-level gender patterns."

Although this work is interesting, I choose to avoid it because my research focuses on identifying people who defy norms-- remarkable people who use their voices in public despite being under-represented. Twitter itself does use content analysis (plus names) to identify the gender of their users for analytics and targeted advertising. Many social media gender detection systems combine many factors (see also this report) in order to attain high accuracy levels. Glenn Fleishman recently wrote a summary of automated content-analysis gender inference for BoingBoing.

Inferring Gender from Behavior

Your behavior on social networks, including your friendship network, can also reveal things about gender. In one paper I find troubling, researchers have been able to identify gay men at MIT from their friendship network, even when they kept that information private. I personally avoid using these kinds of techniques in my research, on ethical grounds.

Inferring Ethnicity and Race

Ethnicity and race are much harder to infer from names or photos. Although techniques do exist to identify ethnicity from names (see this 2010 paper by Chang, Rosenn, Backstrom, and Marlow that incorporates names and relationships), the accuracy varies by ethnicity/race, and the census is thinking about redesigning their race and ethnicity categories to deal with problems. Although tempted by the opportunity to include measures of intersectionality in my research, I haven't yet gone down this rabbit hole in my own work. Photos are an emerging source for analysis of race and ethnicity. Although it's not a thoroughly developed area, photos from social networks have been used to train ethnicity facial-recognition detectors (here, here, and here).

Privacy and Ethics

Quantitative studies of underrepresented groups involves carrying a huge imbalance of power with people who already have problems with power. When doing quantitative work on gender, whether binary or not, it's important to keep the following things in mind:

  • Always work in conversation with people from the community you're studying so they can question or encourage your work as needed. In my work on the media, I'm deeply grateful for all the social justice orgs, professional organizations, and journalists who have helped me think through these issues. If possible, I try to design together with the people who are affected by my work, and I'm very inspired by Jill Dimond's approach to Feminist HCI (pdf).
  • Consider the harms that could occur for the people you're working with. Actions that seem like a good idea can have unexpected consequences. For example, danah boyd's talk about the power of fear in Networked Publics helped me understand that transparency can often hurt the most vulnerable, a perspective that has helped me avoid mistakes in my design work
  • Quantitative analyses that occur without the knowledge or consent of people involve a very serious power and voice imbalance that should not be done without careful thought and consultation. A good rule of thumb, from Sunil Abraham, is that the greater the power of a person, the greater transparency is acceptable. For my research on journalists and other public figures, I'm building on decades of feminism that has considered this form of transparency justifiable.
  • Support people's agency and privacy. Data about the gender of an individual can have serious consequences for people if shared in the wrong context, especially since obscuring someone's gender may be the best means to ensure fairness. In the FollowBias system, Sarah Szalavitz and I stuck with gender binaries for this reason -- if we had allowed people to note the non-binary genders of their friends, someone could have been outed against their will.

The ideological limitations of data activism

In my talk at the MIT Gender and Technology symposium, I wondered alout whether the above algorithms were actually pulling me towards non-intersectional feminist activism that focuses primarily on white women who are public figures. I think it's a very real risk. Yet since then, I've seen projects like the Texas Tribune Gender Wage Gap interactive (article here), which used Open Gender Tracker to look at wage gaps across all state employees, not just the highly paid ones. At the moment, we're just learning how to use this data to support social change, so we're in no risk of over-emphasizing metrics. Yet even as we implement the above methods, it's important to retain a critical perspective and a focus on the change that matters.