Anirban Ghoshal
Senior Writer

IT leaders should measure and balance fairness in AI models, Forrester says

Jan 24, 2022
Artificial Intelligence

Eliminating bias in artificial intelligence is increasingly an issue as more enterprises use it in their services.

Artificial intelligence and digital identity
Credit: Thinkstock

Bias in artificial intelligence development has been a growing concern as its use increases across the world. But despite efforts to create AI standards, it is ultimately down to organizations and IT leaders to adopt best practices and ensure fairness throughout the AI life cycle to avoid any dire regulatory, reputation, and revenue impact, according to a new Forrester Research report.

While a 100% elimination of bias in AI is impossible, CIOs must determine when and where AI should be used and what could be the ramifications of its usage, said Forrester vice president Brandon Purcell.

Bias has become so inherent in AI models that companies are looking at bringing in a new C-level executive called the chief ethics officer tasked with navigating the ethical implications of AI, Purcell said. Salesforce, Airbnb, and Fidelity already have ethics officers and more are expected to follow suit, he told

Ensuring AI model fairness

CIOs can take several steps to not only to measure but also balance AI models’ fairness, he said, even though there is a lack of regulatory guidelines dictating the specifics of fairness.

The first step, Purcell said, is make sure that the model itself is fair. He recommended using accuracy-based fairness criterion[GG3]  that optimizes for equality, a representation-based fairness criterion that optimizes for equity, and an individual-based fairness criterion. Companies should bring together multiple fairness criteria to check the impact on the model’s predictions.

While the accuracy-based fairness criterion ensures that no group in the data set receives preferential treatment, the equity-based fairness criterion ensures that the model is offering equitable results based on the data sets.

“Demographic parity, for example, aims to ensure that equal proportions of different groups are selected by an algorithm. For example, a hiring algorithm optimized for demographic parity would hire a proportion of male to female candidates that is representative of the overall population (likely 50:50 in this case), regardless of potential differences in qualifications,” Purcell said.

One example of bias in AI was the Apple Card AI model that was allocating more credit to men, as was revealed in late 2019. The issue came to light when the model offered Apple cofounder Steve Wozniak a credit limit that was 10 times than that of his wife even though they share the same assets.

Balancing fairness in AI

Balancing the fairness in AI across its life cycle is important to ensure that a model’s prediction is close to being free of bias.

To do so, companies should look at soliciting feedback from stakeholders to define business requirements, seek more representative training data during data understanding, use more inclusive labels during data preparation, experiment with causal inference and adversarial AI in the modeling phase, and accounting for intersectionality in the evaluation phase, Purcell said. “Intersectionality” refers to how various elements of a person’s identity combine to compound the impacts of bias or privilege.

“Spurious correlations account for most harmful bias,” he said. “To overcome this problem, some companies are starting to apply causal inference techniques, which identify cause-and-effect relationships between variables and therefore eliminate discriminatory correlations.” Other companies are experimenting with adversarial learning, a machine-learning technique that optimizes for two cost functions that are adversarial.

For example, Purcell said, “In training its VisualAI platform for retail checkout, computer vision vendor Everseen used adversarial learning to both optimize for theft detection and discourage the model from making predictions based on sensitive attributes, such as race and gender. In evaluating the fairness of AI systems, focusing solely on one classification such as gender may obscure bias that is occurring at a more granular level for people who belong to two or more historically disenfranchised populations, such as non-white women.”

He gave the example of Joy Buolamwini and Timnit Gebru’s seminal paper on algorithmic bias in facial recognition that found that the error rate for Face++’s gender classification system was 0.7% for men and 21.3% for women across all races, and that the error rate jumped to 34.5% for dark-skinned women.

More ways to adjust fairness in AI

There are couple of other methods that companies might employ to ensure fairness in AI that include deploying different models for different groups in the deployment phase and crowdsourcing with bias bounties — where users who detect biases get rewarded — in the monitoring phase.

“Sometimes it is impossible to acquire sufficient training data on underrepresented groups. No matter what, the model will be dominated by the tyranny of the majority. Other times, systemic bias is so entrenched in the data that no amount of data wizardry will root it out. In these cases, it may be necessary to separate groups into different data sets and create separate models for each group,” Purcell said.