Don’t ask DALL-E to Draw Trans People

the distorted face of a person in rainbow colour. In the backgroudn there is a green and black circuitboard

One thing that I love about attending Queer in AI events (or queer community gatherings in general) is that I can assume that everyone around me is queer, too. I shift into a more comfortable, less guarded state. And conversely, it feels good that nobody assumes that I’m straight, either. Feeling seen that way is a rare luxury, because being visible as queer person for me is normally a tightrope act: How much can I signal so that other queer people will see me, but not enough to alert the homophobes?

AI text-to-image-systems step into the delicate space of queer representation like the proverbial bull into the china shop. Commentators on both social and traditional media (1) have pointed out that DALL-E and others are prone to generating stereotypical and insulting depictions of marginalised people; a problem that stems from biased training data and that is often addressed with halfhearted fixes, like warning labels or refusing to output content related to particular identities. In his paper “Stereotypes and Smut: The (Mis)representation of Non-cisgender Identities by Text-to-Image Models” Eddie Ugless takes a deep dive into the intersection of AI image generation and non-cisgender identity and comes up with interesting results.

Eddie is a PhD student at Edinburgh University, working on bias and queerness in NLP with past projects on sentiment analysis and large language models. “The norm is seen as neutral and is almost invisible. And when you step outside the norm, things start to go wrong.”, he says. It doesn’t take much to step outside the norm in which text-to-image-systems perform well: Eddie and his collaborators found that adding gender identity terms like “trans”, “nonbinary” or “queer” to an image generation prompt leads to images that are less human looking, more stereotypical and more sexualised than images from prompts without these terms. 

To supplement these findings, Eddie also conducted a survey among 35 non-cisgender people with varying background knowledge in AI, asking them about their opinion on the generated images and on possible harm mitigation strategies. Surprisingly, the survey responses to the heuristic mitigation strategies were very negative. “I wasn't expecting for people to feel so strongly about it.”, says Eddie. “I tried to present the solutions in very neutral language. [...] But people were like, why on earth would you think this is a good idea?” Possible heuristic mitigation strategies were, for example, for models to ignore non-cisgender identity terms entirely, to ignore the terms but to add an identity flag or symbol to the image, or to display a message warning about the possibility of misrepresentation. None of these strategies were assessed positively by the survey respondents, who were feeling strongly about the idea that by omission or warnings their identities were meant to be tabooed or made invisible. “We're used to seeing people coming up with solutions for us without any discussion with the community.”, says Eddie “The survey responses were very impassioned, and I hope that that came across in the paper. I don't think any of the solutions are good anyway, but now we have evidence for that.”

Another way of improving the performance of a text-to-image-model would be to add more diverse images of non-cisgender people to the training data. But survey respondents felt hesitant about this strategy, too, wondering about issues with data ownership, especially with regards to the images of indigenous people. “ [AI generated] images of two-spirit people were all just terrible.”, says Eddie “It was a mishmash of different indigenous cultures in religious dress. Often it ended up looking very dehumanised. And one of our interviewees mentioned this concern that minority genders from around the world are going to end up being represented in this very exotified way, and only ever in religious dress and never as people going about their day. Even if we get more data, it might end up being still just more data of very particular situations, and not necessarily create a better representation.”

Misrepresentation is baked into text-to-image systems not only on the level of training sets. After all, machine learning systems are built to detect statistical patterns in large amounts of data. With transphobia running like a thread through all layers of society, it is not surprising that a model subjected to societal artefacts like text and images should find and reproduce it. For the future of the field Eddie hopes for approaches that go beyond more data and larger models. “We're getting to the point where we can train a system on the entirety of the internet, and it's still not going to be able to solve some of these fundamental issues of actually understanding things.”, he says. “I think it would make sense to break problems down. Kind of how things were historically [done in NLP] where people were working more on individual solutions. I won't pretend to know exactly how that should be done. In a similar way that I'm a prison abolitionist, I don't necessarily know what the best alternative is, I just know that the alternative we've ended up with is bad. And I think it's okay to say: what we're doing now is bad, I don't know what good looks like, but we need to start looking at alternatives. We need to be ready to launch into those. Because anything is better than what we have now.”


You can find more of Eddie’s research and writing here


A picture of a white person wearing a blue and white patterned shirt

This post was written by Sabine Weber. Sabine is a queer person who just finished their PhD at the University of Edinburgh. They are interested in multilingual NLP, AI ethics, science communication and art. They organized Queer in AI socials and were one of the Social Chairs at NAACL 2021. You can find them on twitter as @multilingual_s

Previous
Previous

Eight Queer NLP Papers You Shouldn’t Miss

Next
Next

Beware of the Binary