kirill_makarov - stock.adobe.com
Diverse talent pools and data sets can help solve bias in AI
Bringing historically underrepresented employees into critical parts of the design process while creating an AI model can reduce or eliminate bias in that model.
Bias in AI is a serious problem that can create unintended outputs in AI models, negatively affecting the enterprises that use them.
Technology vendors can help solve this problem by hiring diverse employees to lend their views to AI products created by the vendors.
Diversity in, diversity out
"Inclusive inputs lead to inclusive outputs," said Annie Jean-Baptiste, head of product inclusion at Google.
Speaking at a panel on gender and racial bias in AI at the CES 2021 virtual tech show on Jan. 12, Jean-Baptiste noted the importance of including multiple perspectives, especially perspectives that have historically been underrepresented in the tech industry, in critical moments of product development to help lessen racial or gender-based bias in AI models.
When Google created Google Assistant, Jean-Baptiste said, the vendor also put it through adversarial testing -- essentially testing to try to break the product -- to ensure it remains unbiased.
Part of that testing involved bringing in groups that have been traditionally underrepresented based on race, gender and sexual orientation to find and change the negative responses Google didn't want the Assistant to say and add in positive cultural references to its responses.
Jean-Baptiste said this key step in the design process was a success, as it greatly reduced the amount of biased or potentially alienating responses by Google Assistant.
Meanwhile, companies should prioritize hiring diverse candidates, said panelist Taniya Mishra, founder and CEO of SureStart, a company that helps organizations build diverse workforces with training and education.
She said she hears many people say that while diversity is important, they want the best candidate. That thinking, she noted, is wrong.
Instead, companies should say "Diversity is really important, and I want the best," Mishra said, emphasizing that the goals are of equal value.
"There is no problem between having a diverse set of candidates and getting the best," she said.
While a diverse talent pool is needed to create diversity in organizations, it's also critical to build models with large, diverse data sets, said Kimberly Sterling, senior director of health economics and outcomes research at ResMed, a health IoT vendor.
Technology vendors must use diverse data sets built on diverse populations to develop their models, she said.
This is particularly important in healthcare, as certain medications or products may work differently with different types of people. Suppose a healthcare company builds a predictive model based on data largely taken from white men. In that case, it may spit out biased or incorrect predictions when trying to predict how a drug, for example, may react with a woman or person of color.
"When you have data sets that aren't representative, then we end up coming up with really challenging situations," Sterling said.
She said companies must make sure they include underrepresented groups in all their data gathering and product testing.
Annie Jean-BaptisteHead of product inclusion, Google
Similarly, Mishra said she focused on voice technology in her studies at Oregon Health and Science University in the early 2000s. She recalled that, back then, most of the data sets she had to work with consisted of recordings of news anchors.
Primarily, she said, the voices in the data sets were of Caucasians with standard American accents and a polished way of speaking. The lack of diversity made it difficult for her to build voice models that understood different types of speakers, including those with accents, she said.
While she noted that voice data sets have gotten better since then, they still mostly lack data from children and the elderly, leading many voice models to struggle to understand those demographics.
She explained that technologists need to focus on gathering data from underrepresented groups and building their AI models with diverse data sets.