The opening paragraph of your paper sucks, and here is why.

An analog scale with one cup lower than the other. In the higher cup there are people on a queer pride march. the lower cup of the scale contains a bar chart with increasing bars and an upwards pointing arrow

Since the beginning of time humans have tried to 

In recent years the field of machine learning has rapidly 

Over recent decades machine learning has risen

When I tried to write my first machine learning paper I was clueless about the narrative conventions of this genre. I tormented my supervisor and my colleagues with many horrible opening paragraphs. I am not alone in that. Ph.D. students all around the world scramble to justify why their paper should be the one that reviewers pick, and then postdocs and professors delete those paragraphs and find better ways to say the same. But what happens if we zoom out and look at all that scrambling from a bird's eye perspective? What can the emerging patterns tell us about the field of machine learning?

The paper “The Values Encoded in Machine Learning Research”, published in 2022 at ACM FAccT, does exactly this. It is the first paper to look at a corpus of highly cited machine learning papers and to examine how researchers justify their contributions to the academic discourse. The authors go to great lengths to answer this question: They diversify their examined corpus of papers, taking papers published in 2008/2009 and 2018/19 to look past short-lived trends. They devise an annotating scheme and train six researchers from diverse fields (among them computer science, cognitive science, and social sciences) to annotate the papers. The annotators then go through the papers sentence by sentence and annotate each for the presence of one of 75 values, reaching high inter-annotator agreement.

This seems like an awful lot of effort for something that I didn’t pay much attention to in my papers. After all, the results should count and not why I am researching a specific task, I thought. There are personal motivations why one might choose a research project on question answering over one in named entity recognition. But at the end of the day the “why” boils down to “because my supervisor had funding for it”. And the question of why my supervisor got money to solve a specific problem seemed very much outside of the scope of my considerations. After all, I had a Ph.D. to get.

To anyone familiar with machine learning papers, the dominant values identified by the authors of the FAccT paper do not surprise: They are performance, generalization, efficiency, building on past work, and novelty. This rings true for the papers that I have written, too. Writing opening paragraphs I tried to highlight that my model outdoes other models on established NLP tasks. I did that because I assumed that this was the best way to convince the reviewers and thereby push my paper through to the conference. This is the accepted way of saying: “My paper is relevant because others work on the same task. It is important because I do better than them.” The authors of the FAccT paper are aware of this self-perpetuating feedback mechanism. They write: “submitted papers are typically explicitly written to win the approval of the community, particularly the reviewers who will be drawn from that community. As such, these papers effectively reveal the values that authors believe are most valued by that community.”

If “The Values Encoded in Machine Learning Research” just confirms what a cursory glance at the introductory paragraphs of machine learning papers would have easily shown, was this even a study worth conducting? There are two reasons why it is incredibly valuable:

First, the study not only shows what is there but also what is missing: Both ethical principles like beneficence and justice as well as user rights like privacy, fairness, and collective influence are under-represented or outright missing. This seems like a surprising finding, considering that these values address the biggest threats coming from ML systems. Why did I never mention them in my papers, either? Because I thought a reference to existing NLP problems would be sufficient. This reveals a deliberate unwillingness of the ML field to question why certain tasks are established in the first place. I also never mentioned these values, because neither in specific research projects nor in my Ph.D. as a whole societal benefits were a guiding concern. In retrospect, this opens up the interesting question of why we do machine learning in the first place. When I entered academia, I was very convinced that the path to making the world a better place would lead via NLP. Aren’t technical advancements a good thing in and of themselves? This leads to the next important contribution the FAccT paper has to make.

Technical values like performance, generalization, and efficiency are not politically neutral. In their paper, the authors go through the six most often invoked values and illustrate how they are defined and applied to support the centralization of power. They also perpetuate the assumption that social impacts of machine learning systems are outside of the scope of machine learning researchers. Papers highlighting seemingly neutral values like performance omit that performance only happens in the context of a metric on a particular data set. Data sets are generally accepted as easily available ‘ground truth’, which leaves several blind spots: What about those who lack the resources to obtain and process these (often very large) data sets? What about the welfare, consent, and awareness of the people who created the data? What about the possibility of differing interpretations of ‘ground truth’ labels depending on social contexts? The authors criticize that performance (as the improvement of a system using a metric on a data set) is often equated with success, progress, and improvement, while these terms might be better suited to systems that are increasingly safe, consensual, and participatory.

Pointing out these blind spots is an important step towards amending them. But where does that leave the single Ph.D. student trying to get a paper into a conference? Or the single reviewer trying to get through their reviewing load before the deadline, or the postdoc applying for funding for the first time? Why does prioritizing user rights and ethical principles over technical advancements seem like a bad idea when it comes to convincing others of our work? After all, isn’t it the desire to make the world a better place what got us into research? The paper offers an interesting perspective on that, too. The authors note that a high proportion of the examined 100 most cited papers originate from elite universities and tech companies. The proportion of papers with big tech affiliations has increased nearly fourfold between the sample of 2008/9 and 2018/19, raising the proportion of those papers to 79% in the 2018/19 sample. 

So yes, my fellow Ph.D. students and I, all of us doing research inside and outside of academia: we want to make the world a better place. This is a complex objective with no clear reward function and a lot of messiness and unanswered questions. But the companies who foot the bill have a much simpler objective, and one that aligns well with the centralization of power and the neglect of negative social impacts: they just want to make money.


This post was written by Sabine Weber. Sabine is a queer person who just finished their PhD at the University of Edinburgh. They are interested in multilingual NLP, AI ethics, science communication and art. They organized Queer in AI socials and were one of the Social Chairs at NAACL 2021. You can find them on twitter as @multilingual_s

Previous
Previous

Our non-binary socio-technical community garden

Next
Next

The Atari to ethics pipeline