Protecting digital customer journeys from AI biases
Today, hundreds of millions of people use tools like ChatGPT to brainstorm ideas, or Midjourney to create new visuals. Artificial intelligence (AI) tools have become part-and-parcel of our daily lives and are propelling the arrival of a new digital era. We now work more efficiently, can better meet professional or creative challenges, and accelerate new innovation.
But AI now has much more intrinsic value than supporting our daily tasks. It is integral to powering our critical services and keeping society running, whether that’s facilitating loan agreements or providing key access to higher education, mobility platforms or medical care. Identity verification, fundamental to online access, was traditionally seen as a gateway to credit checks and opening a bank account, but thanks to AI it now supports services from healthcare to travel and eCommerce.
AI systems, however, can behave in a biased manner towards end-users. Recently, Uber Eats and Google discovered how the use of AI can threaten the legitimacy and reputation of online services. However, humans are also vulnerable to biases. These can be systemic, as shown by the bias in facial recognition, whereby we have a tendency to better recognize members of one's own ethnic group (OGB, or Own Group Bias) - a phenomenon now well documented.
This is where the challenge lies. Online services have become the backbone of the economy with eight in ten people saying they would be satisfied with fully digital services. With lower processing costs and shorter execution times, AI is a solution of choice for businesses in handling an ever-increasing volume of customers. However, despite all the advantages that this solution offers, it is important to be aware of its bias also. Companies have a responsibility to implement the right safeguards to protect against lasting damage to their reputation and the wider economy.
At the heart of a bias prevention strategy are four essential pillars - identifying and measuring bias, awareness of hidden variables and hasty conclusions, designing rigorous training methods, and adapting the solution to the use case.
Pillar 1: Knowing where to find and measure bias
The fight against bias begins with the establishment of robust processes for its measurement. AI biases are often weak, hidden in vast mountains of data and observable only after the separation of several correlated variables.
It is therefore crucial for companies using AI to establish good practices such as measurement by confidence interval, the use of datasets of appropriate size and variety, and the employment of appropriate statistical tools manipulated by competent persons.
These companies must also strive to be as transparent as possible about these biases, for example by publishing public reports such as the "Bias Whitepaper" that Onfido published in 2022. These reports should be based on real production data and not on synthetic or test data.
Public benchmarking tools such as the NIST FRVT (Face Recognition Vendor Test) also produce bias analyses that can be exploited by these companies to communicate about their bias and reduce this bias in their systems.
Based on these observations, companies can understand where biases are most likely to occur in the customer journey and work to find a solution -- often by training the algorithms with more complete datasets to produce fairer results. This step lays the foundation for rigorous bias treatment and increases the value of the algorithm and its user journey.
Pillar 2: Hidden variables and hasty conclusions
The bias of an AI system is often hidden in multiple correlated variables. Let's take the example of facial recognition between biometrics and identity documents ("face matching"). This step is key in a user's identity verification.
A first analysis shows that the performance of this recognition is less good for people with dark skin color than for an average person. It is tempting in these conditions to conclude that by design, the system penalizes people with dark skin.
However, by pushing the analysis further, we observe that the proportion of people with dark skin is higher in African countries than in the rest of the world. Moreover, these African countries use, on average, identity documents of lower quality than those observed in the rest of the world.
This decrease in document quality explains most of the relative poor performance of facial recognition. Indeed, if we measure the performance of facial recognition for people with dark skin, restricting ourselves to European countries that use higher-quality documents, we find that the bias practically disappears.
In statistical language, we say that the variables "document quality" and "country of origin" are confounding with respect to the variable "skin color."
We provide this example not to convince that algorithms are not biased (they are) but to emphasize that bias measurement is complex and prone to hasty but incorrect conclusions.
It is therefore crucial to conduct a comprehensive bias analysis and to study all the hidden variables that may influence the bias.
Pillar 3: Building rigorous training methods
The training phase of an AI model offers the best opportunity to reduce its biases. It is indeed difficult to compensate for this bias afterward without resorting to ad-hoc methods that are not robust.
The datasets used for learning are the main lever that allows us to influence learning. By correcting the imbalances in the datasets, we can significantly influence the behavior of the model.
Let's take an example. Some online services may be used more frequently by a person of a given gender. If we train a model on a uniform sample of the production data, this model will probably behave more robustly on the majority gender, to the detriment of the minority gender, which will see the model behave more randomly.
We can correct this bias by sampling the data of each gender equally. This will probably result in a relative reduction in performance for the majority gender, but to the benefit of the minority gender. For a critical service (such as an application acceptance service for higher education), this balancing of the data makes perfect sense and is easy to implement.
Online identity verification is often associated with critical services. This verification, which often involves biometrics, requires the design of robust training methods that reduce biases as much as possible on the variables exposed to biometrics, namely: age, gender, ethnicity, and country of origin.
Finally, collaboration with regulators, such as the Information Commissioner's Office (ICO), allows us to step back and think strategically about reducing biases in models. In 2019, Onfido worked with the ICO to reduce biases in its facial recognition software, which led Onfido to drastically reduce the performance gaps between age and geographic groups of its biometric system.
Pillar 4: Tailor solutions to use cases
There is no single measure of bias. In its glossary on model fairness, Google identifies at least three different definitions for fairness, each of which is valid in its own way but leads to very different model behaviors.
How, for example, to choose between "forced" demographic parity and equal opportunity, which takes into account the variables specific to each group?
There is no single answer to this question. Each use case requires its own reflection on the field of application. In the case of identity verification, for example, Onfido uses the "normalized rejection rate" which involves measuring the rejection rate by the system for each group and comparing it to the overall population. A rate greater than 1 corresponds to an over-rejection of the group, while a rate less than 1 corresponds to an under-rejection of the group.
In an ideal world, this normalized rejection rate would be 1 for all groups. In practice, this is not the case for at least two reasons: first, because the datasets necessary to achieve this objective are not necessarily available; and second, because certain confounding variables are not within Onfido's control (this is the case, for example, with the quality of identity documents mentioned in the example above).
Don’t delay progress by chasing perfection
Bias cannot be completely eliminated. In this context, the important thing is to measure the bias, to continuously reduce this bias, and to communicate openly about the limitations of the system.
Research on bias is largely open. Numerous publications are available on the subject. Large companies like Google and Meta actively contribute to this knowledge by publishing in-depth technical articles, but also accessible articles and training materials, as well as datasets dedicated to the analysis of bias. In 2023, Meta published the Conversational Dataset, a dataset dedicated to the analysis of bias in models.
Biases are unfortunately unavoidable; as AI developers continue to innovate and applications evolve, biases will always emerge. However, this should not discourage organizations from adopting these new technologies, as they hold great potential for improving their digital offerings.
If companies have taken the appropriate steps to mitigate the impact of biases, customers’ digital experiences will continue to improve. Customers will be able to access the right services, adapt to new technologies and get the support they need from the companies they want to interact with.
Olivier Koch is VP of Applied AI, Onfido.