Getty Images
Differentiating between good and bad AI bias
As lawmakers and regulators look at ways to make machine learning models fair, some tech vendors are creating tools that aim to enable enterprises to achieve that purpose.
AI bias is a challenging problem that lawmakers, regulators, vendors and enterprises are trying to address.
Economic, gender and racial bias built into or caused by AI algorithms is under intense scrutiny around the world as AI technology becomes commonplace in enterprises and daily life.
The push for regulating AI is picking up momentum, with one move coming on Feb. 3, when Democrats in Congress filed an updated version of the Algorithmic Accountability Act, a bill -- originally introduced in 2019 -- that would require audits of AI systems used in industries including finance, healthcare, housing and other areas.
A host of machine learning platforms claim to be equipped with bias detection, including new features recently introduced by automated machine learning vendor InRule Technology.
The vendor on Feb. 3 introduced new bias detection feature for its machine learning platform, xAI Workbench. The features help organizations whose models contain predictions that could be bias toward people of different gender, race or age.
The features examine the fairness of the machine learning model and aims to ensure that the model treats people with similar characteristics the same way. For example, it aims to makes sure that people who are unemployed are treated similarly regardless of their gender or racial status.
In this Q&A, Theresa Benson, director of product marketing at InRule Technology, discusses the xAI Workbench platform's bias detection technology, and whether bias can ever be truly eliminated from a model.
What are the problems with some of AI bias detection tools?
Theresa Benson: Well, there's all different kinds of bias in machine learning, and our focus is on really diving into an understanding of the granular potential for harmful bias in a model.
Some platforms talk about having bias detection built into their models. And they do things like monitor the population of what they're querying to the model over time. ... If they haven't kept their training data and their models up to date, you start to introduce what's called sample bias.
A lot of organizations will say that's bias detection, and that promotes fairness. We're here to say that it does not.
Theresa BensonDirector of product marketing, InRule Technology
Then a lot of people say 'OK, well I'll just remove the distinguishing characteristic or the protected characteristic from the data that I train my model on.' If you look at race or gender, age, any of those protected characteristics or even other characteristics that you want to make sure your model is not biased against, you simply pull those columns out of your data set, and then train your models. There is a term for that. It's called fairness through blindness or fairness through unawareness.
Here's the problem with that. Let's just say for example, we take gender out of a data set that we're using to train a model. There's applicants' sex, name or gender as a column, but you leave in marital status, and you leave in whether they have dependents. In the United States, I think the statistic is something like more than 80% of single parents are female. What you did is remove [gender] and yet gender is still in your data. The amount of data available to organizations that want to utilize it for AI is massive, and it is crazily correlated in ways that we may not even be aware of.
It is difficult if you take that column out of your data, then go back later and try to prove that your model wasn't somehow unintentionally biased toward that protected characteristic. Because now you don't even have it as a metadata field to say 'see, it isn't biased,' because you've removed it altogether.
How is your AI bias detection product different?
Benson: The first thing that sets us apart is for every single prediction that you might make with our platform, we give you every single factor into the 'why' behind that prediction.
The other piece is our clustering engine. You can make a model with as many clusters as you want. With each of these clusters, what we've done is we've dumped a population of data in, and our platform has built a model. And then it's told us: 'Here are the 50 distinct groups within that population of data organized by similarity.'
What our tool does is drill in and make sure that given everything else being equal, and not using the characteristic you're worried about in the data, you can still see if the model is biased in a harmful way against the characteristic you care about.
Is your tool mitigating just bad bias?
Benson: [It's doing it] for any bias because ... a machine learning model is essentially biased.
Bias in and of itself may not be harmful. I think the disservice that sometimes happens is that people just use bias without a qualifier. Marketers start to try and cash in on 'hey, we have bias detection.' When we say we have bias detection, we have bias detection. It's up to our customers through rules, policies and practices, to determine for the decision they want to make, whether this bias is harmful or acceptable. A prevalence of breast cancer that is biased toward a population that is female is not harmful bias. That is practical bias.
Is there a way to remove all bias whether good or bad?
Benson: You can mitigate for bias. You can take automated action or human action if you sense bias. But to remove it all, I don't think that's possible.
Bias in the mathematical sense is not inherently sexism or racism.
There's algorithmic bias, where a system like an algorithm can perform slightly differently for two groups.
We have customers who use our machine learning model to categorize items for warehousing. Machine learning isn't just used for people decisions and whatever. Imagine putting your entire inventory into our machine learning platform and having the machine learning platform tell you the optimum way to merchandise these products. So, there will be bias in the model based on size, shape, weight in terms of where it might be best merchandised and so in that case, that's not harmful bias.
It's when you are looking at characteristics that are entirely equal among a population, a cluster. And those characteristics that are protected such as disability, age, race, gender -- those are the only differences in your population. And they should not matter for whatever classification you're making. And yet, the accuracy is different due to those characteristics. That's when it's harmful.
I don't think you can get rid of bias altogether. Algorithmic bias is how [the algorithm] determines how to sort and organize things. It's more about making sure that the type of bias makes sense for what you're trying to do.
Editor's note: This interview has been edited for clarity and conciseness.