Weissblick - Fotolia
CTO on the need for AI ethics and diversity
A CTO talks about the importance of diverse data sets when creating AI models and how a lack of diversity can create bias in systems.
As businesses and government agencies increasingly rely on AI systems to automate and augment their workflows, it's imperative for the good of the people they serve that their systems remain ethical and accurately represent diverse populations.
Yet, that's not a simple task.
Charles Onstott is the CTO at SAIC, a technology integrator that does a lot of work with government agencies.
Onstott is responsible for SAIC's long-term technology strategy, ensuring that SAIC stays competitive and provides quality services and capabilities to its customers. At the same time, Onstott has to also make sure SAIC's technological work remains ethical.
In a Q&A with TechTarget, Onstott discusses the ethical ramifications of AI systems and the steps organizations can take to reduce bias in their AI models. Including diverse data in their models and hiring a diverse staff is a good start.
What are some of the modern ethical challenges when it comes to AI?
Charles Onstott: SAIC is a technology integrator that does most of our work with the government, and that means that we are applying advanced technologies to government missions that include things like the intelligence community, military operations, law enforcement. Obviously, there's a lot of conversation going on in media and in social media around whether the government should be using technologies like AI and applying them in these areas.
There's a lot of reasons for that. Bias is a major topic of the day, certainly with the increased emphasis on the Black Lives Matter movement in the past year with the George Floyd incident driving in people's minds how important it is that we don't create systems of injustice that effectively harm certain groups of people based on race, gender, sexual orientation, and so on.
When we think about the use of artificial intelligence -- and because of the way that it works, especially machine learning -- where we're basically building models that are optimized to reflect back what we feed into it in a fairly consistent way for us to gain insights and make decisions, we have to really pay attention to the inputs that are going into that model. We have to pay attention to the whole system.
I think where I see a lot of the conversation is focused too much on the algorithm. In fact, most of the time the algorithm is not really the issue. The issue has more to do with the total system -- how the data was even gathered to begin with, the sources of that data, what technologies were used in capturing that data, and how that data got fed into the particular algorithm that's being used to build the model. The system then includes the processing of future decisions based on that model, and human beings that are taking the outputs from that model and applying it in their day-to-day work.
Charles OnstottCTO, SAIC
For example, I know there's a major concern in the public's mind that law enforcement's use of AI might result in people being unfairly targeted. People have been arrested based on AI misidentifications because it's known that certain races tend to have higher false positive rates using, say, facial recognition technologies.
So, it starts to move the challenge beyond the realm of just meeting the contractual requirements of building a system. Now, we have to look at if those requirements result in a system that perpetuates or creates some sort of social justice problem.
That becomes a large problem for the government because it needs to use these technologies to carry out their missions more effectively. But it also becomes a challenge for a company like SAIC that's responsible for integrating those technologies. Therefore, we really need to develop those principles around how we approach the designing and building and testing of the systems that are using these kinds of technologies.
What can an organization do to limit bias in its system?
Onstott: The big question today.
Often, we talk about the term bias; there's a negative meaning associated with it, about an AI system that is resulting in a selection of a group of people based on unfair conditions.
A recent study of facial recognition systems, for example, found that, in fact, Black people, women and young people had much higher rates of false positives in the algorithm. Therefore, we have to really pay attention to which algorithms we're using and the data sources that are going into it.
One of the significant findings of that study is that they also pointed out that Asians also had higher false positive rates. Now, this is based on looking at facial recognition systems in the United States. However, when researchers looked at the performance of facial recognition systems in China, they found that Asians did not have the same high false positive rates. That immediately suggests that the data really does matter. That can't be understated.
When you look at the data you're feeding into the system, you have to think about the diversity of representation of imagery that's going to go into that system, which may be disproportionate to the actual population.
For example, in the United States, we have a smaller population of Black people and Asians than white people. So, suppose we don't do something specific to counter the fact that the vast majority of images going into a facial recognition system are of white people. In that case, the system is just by definition going to be better at detecting differences in images of white people than other groups of people. So, organizations need to pay attention to the data sets going into their AI systems.
Organizations also need to look at the outputs of those systems and measure the rate of unexpected outputs and false positives. That can help test the effectiveness of these algorithms on, say, different demographics.
Another important dimension to this is the people who are involved in developing these systems and platforms. Just as an example, if you have a group of white people who are designing an entire system from beginning to end, it may never occur to them that they have an underrepresentation of the population to begin with in the data. It may never even occur to them.
I think one way to counteract that is to ensure that the teams working on these kinds of projects are themselves diverse, so that you're bringing a lot of different perspectives to the table. That way, these kinds of issues would be surfaced more easily or earlier in that process. There's been a lot of research that shows that when you have diverse teams, you do get better innovation, and part of that is because you are bringing all those different perspectives to the table. So now the design of the system can take into consideration a lot of additional factors that might not occur to a group of people who all look the same.
How can organizations compensate for data, in say facial recognition systems, that is overly representative of a particular group of people?
Onstott: There are a few things that can be done. One is intentionally over-selecting the representation and the population of images that you're using. So, you would deliberately, say, decrease the number of images of white people and increase the number of images of people of other races.
So, suppose you had a data set, representing, let's say, a more normal distribution of the population in the United States. In that case, you want to make sure that you're increasing the representation of people of color.
Editor's note: This interview has been edited for brevity and clarity.