At Edge Case Research, we believe that artificial intelligence (AI) has the potential to make everyone safer. For example, in the future, the constant vigilance and superhuman reflexes of AI in self-driving cars could dramatically reduce accident rates. But exceeding the performance of human drivers won’t be easy. It will depend on AI that is safe and reliable across a range of driving conditions.

Bias, or systematic error, is a serious challenge to safe AI. Bias can be found in AI-driven products today, such as facial recognition software that performs more accurately for white men than it does for women and people of color. Bias can creep into AI for many reasons. A poor choice of camera could be ill-suited for darker skin. Perhaps training data was captured from the faces of company employees — who were mostly white and male.

ECR AI images

What kinds of problems might you find? Here are some examples from analyzing the performance from an open source pedestrian detector. (Source: Edge Case Research)

Because AI is used for more safety-critical applications, it’s easy to see how bias could pose risks. Self-driving cars use AI, not unlike facial recognition, for detecting pedestrians. We don’t want cars to be more likely to get into accidents with people that have longer hair, darker skin, or shorter stature. Safety shouldn’t be contingent on how you look.

The performance of our pedestrian detector can be described as a trade-off between misses and false alarms. The nature of this trade-off is expressed as a function built by measuring how well the detector handles thousands or millions of human-analyzed images. But this kind of analysis can miss important safety risks. Our detector could perform well enough across the overall population of pedestrians, but it could still miss people in wheelchairs far too frequently. This sort of bias would be clearly unacceptable.

Bias arises from machine learning itself

A detector trained on adults may miss children who are typically shorter. Certain scenarios and lighting conditions may also not be well represented in training; for example, people wearing dark clothing (who are photographed against dark backgrounds) or the presence of sun glare.

Bias in AI can be surprising because machine learning is so unlike human learning. We’ve seen instances where AI failed to detect construction workers wearing high-visibility vests. Although you expect these vests to improve detection rate, consider this problem[MM2] from a statistical point of view. Only a tiny fraction of pedestrians actually wear yellow vests, so a bright yellow color is not correlated with the presence of a person. The list of strange examples goes on. Neural networks can lose track of people standing near vertical edges — like poles. Did you think of that?

Wagner graph

False Alarm Rate (Diagram: Edge Case Research)

This leads to a chicken-and-egg problem: you can measure bias in AI but only if you know to look for each particular kind of bias. That’s why we believe that standards for building safe autonomous products should not only require manufacturers test for known biases but also monitor carefully for new, unexpected biases — which may not manifest themselves until product usage becomes widespread. That’s also why Edge Case Research has built a testing and analytics platform, called Hologram, that helps developers speed up the identification of biases and edge cases so that safety problems can be solved much faster.

- Michael Wagner, co-founder and CEO of Edge Case Research