3 Things that AI Ethics Toolkits Get Wrong

Three metal pliers with googly eyes look at two rainbow colored origami cranes

We want to be good. We want to do ethical AI work. Ethics should be a priority at every step of the development process, starting with choosing the tasks a ML model may perform and ending with ongoing monitoring of user interactions. Research and development of models should be participatory, but in reality the responsibility is placed foremost with those who build the models: data scientists, software developers and machine learning engineers.

In recent years many AI ethics toolkits have stepped upon the scene to address this need for accountability. Tools like Fairlearn, AIF360, Aequitas, Themis-ML and others offer pre-implemented fairness metrics and mitigation algorithms. But do these solutions actually help practitioners in the real world? A team of researchers from Cambridge and Carnegie Mellon University looked at this question in their recent paper “Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits”. In their experiment they give practitioners the task of developing a model to recommend which students should get extra tutoring, using a data set of Portuguese highschool students. A task that is quite fraught with possible pitfalls: How should the students gender be taken into account, their address, their parents’ education level?

The practitioners had the choice of two AI ethics toolkits (AIF360 and Fairlearn). Both of the toolkits were completely new to them, which allowed the researchers to study their engagement with the toolkits at every step: Learning a toolkit’s functionalities and then using it to analyse the given data and to develop the model. The researchers watched the developers working things out and instructed them to narrate their thought process as they went along.

In so doing they identified 3 important needs that AI ethics toolkits currently fail to meet:

AI ethics toolkits need to teach: While they are conceptualised as tools for assisted problem solving, practitioners understand toolkits as sites to learn about unfamiliar AI ethics concepts. Toolkits could address this need by offering interactive sites that cover ML fairness concepts, procedures and best practices. Especially the step of exploratory data analysis and concepts like Datasheets and Model Cards are left uncovered by the toolkits. They also need to help practitioners avoid common pitfalls by not only providing positive examples for use, but also by showing anti-patterns that demonstrate how incorrect use of the toolkit can worsen rather than solve ethical problems.
AI ethics toolkits need to blend in: In their work environments practitioners operate under tight time constraints, with ethics being only a minor concern compared to model deployment. Because of this it is especially important for toolkits to seamlessly integrate with established ML development pipelines. The researchers also observed that the practitioners often reformulated the problem to fit the provided code examples, leading to overall worse models. This could be avoided with documentation and tutorials that provide context specific information on how to apply concepts.
AI ethics toolkits need to bring others into the conversation: People who build machine learning models don’t work alone. They need to communicate concerns and outcomes to non-technical colleagues within their company, and they would like to call in the help of domain experts when it comes to unfamiliar ethical problems. Multiple practitioners stressed that if they faced the assigned task in a work context they would have called in the expertise of people familiar with the Portuguese education system and related cultural issues. AI ethics toolkits should help to facilitate these conversations by being accessible to people without a background in machine learning, maybe even going as far as being a platform where practitioners can reach out to lawyers, educators and other domain experts.

In an ideal world watching out for ethical pitfalls is a fixture of the development process. What would an ideal AI ethics toolkit look like? Ethics can not be addressed outside of societal context, and a toolkit should invite that context: Through learning materials, integration with existing workflows and the voices of domain experts and those who will be most impacted by the finished product. AI toolkits could be one more way to open the door and let the world in.

A picture of a white person wearing a blue and white patterned shirt — This post was written by Sabine Weber. Sabine is a queer person who just finished their PhD at the University of Edinburgh. They are interested in multilingual NLP, AI ethics, science communication and art. They organized Queer in AI socials and were one of the Social Chairs at NAACL 2021. You can find them on twitter as @multilingual_s

3 Things that AI Ethics Toolkits Get Wrong

Beware of the Binary

Take Your Ethics to Work Day