When AI Goes Wrong

Article By : Ann R. Thryft

Garbage in, serious garbage out. AIs have to be more than just accurate; they have to be fair.

Artificial Intelligence or AI has been touted as the Holy Grail of what seems like innumerable applications for automating decision-making. Some of the more typical things AI can do better or faster than people include making movie recommendations on Netflix, detecting cancer, tuning e-commerce and retail websites for each visitor, and customizing in-car infotainment systems.

A few of the more unusual things these automated systems are doing include making better beer, converting thoughts into speech, and creating faster-than-you'd-ever-think-possible death metal.

Automated systems have also had some spectacular failures. The self-driving car, proposed as a shining example of what AI can do, failed when a self-driving Uber SUV killed a pedestrian last year. More and more AI systems are being used to make decisions about people and where they can live; what jobs they can have; whether they can be insured and how much their rates will be; what kind of mortgage loan they can get; and whether the Department of Homeland Security thinks, based on their face, that they might be a terrorist.

Most of the fairness (or not) of an AI's decision-making depends on the accuracy and completeness of the test data sets used by the AI’s training algorithm. It also depends on the accuracy of the algorithm itself and how decisions are made about "success.” The training algorithm's optimization strategy can actually amplify bias if it's optimizing for the maximum overall accuracy of an entire population. Three renowned researchers in AI fairness discuss some of the details in a companion article, AI Researchers Answer The 5 Big Questions About Fairness.

It's important to understand how the data can be biased, since at first the idea of biased data can sound nonsensical. Sometimes, the data set is biased because it's incomplete: the data doesn't reflect the real world. For instance, self-driving car AIs may be trained on sensor data that omits children, people in wheelchairs, or construction workers wearing fluorescent vests, as Michael Wagner describes in a companion story for this report, Risk of AI Bias in Self-Driving.

Even if the data does reflect the real world, it can still be biased if that "real world" includes past societal inequalities. For instance, some subgroups of the population — identifiable by any number of categorizations including race, gender, and geographic region have been hired far less often than others for certain jobs due to societal biases — perhaps they've never been hired for that job. That fact will be amplified by the mathematical bias of the algorithm (if, as is usual, it's optimizing for the general population of the database of who has held that job) and it will it will effectively ignore those minority populations, thus automating the bias.

One example of this amplification of data set bias by the AI algorithm occurred a couple of years ago, in experiments done by researchers at the University of Virginia. They discovered that image data sets of people doing ordinary things like cooking, shopping, and playing sports were highly biased by gender. There were many images of women cooking and shopping but hardly any of them playing sports; the reverse was true for images of men. While this discrepancy may not be surprising, what happened when machine learning algorithms were trained on these data sets was something else. The training algorithms didn't just reflect those biases — they amplified them — to the extent that they often identified pictures of men cooking as pictures of women cooking. The University of Virginia paper contains a photo showing a man, cooking at a stove, clearly labeled as "woman."

Algorithm Decisions and Bias in AI

algorithm decisions

A 1969 diagram shows how a simple algorithm makes decisions for evaluating forestry opportunities under three investment criteria. (Source: “A computer program for evaluating forestry opportunities under three investment criteria,” Chappelle, Daniel E.)

Without humans-in-the-loop, biases like these remain invisible and unaccounted for until someone discovers that facial recognition doesn't work correctly on more than a third of smartphones tested, or the Department of Housing and Urban Development sues Facebook over housing ad discrimination caused by its advertising platform.

Although it's not an example of AI bias, one scary story of potential AI-gone-wrong was discovered by researchers at the University of Bologna, Italy, who were studying pricing algorithms (as are used by e-commerce sites like Amazon). In their experiment, two reinforcement-learning-based pricing algorithms responded to each other's actions and colluded to set prices higher than they would have set them independently.

One of the most written-about fairness failures of the technology has been in facial recognition. Although this may not directly affect the AI applications that many engineers are developing, there's good reason for the attention paid to the set of problems surrounding the misidentification and misclassification of gender and race in faces. At minimum, these are clear, easy-to-understand examples of what happens when "foresight engineering" is not applied.

But also possible, is that companies selling AI systems may begin to face liability problems for damages caused by their products. This is already a concern in the insurance industry (an early adopter of AI systems for automating repetitive processes and performing risk analysis). Last year, a study by insurance giant Allianz Global Corporate & Specialty found that "companies will also face new liability scenarios as responsibility for decision-making shifts from human to machine and manufacturer," and that a "new framework [is] needed to manage [the] rise in AI-generated damages," according to a prepared statement. Although AI agents may take over decisions previously made by humans, the agents, of course, can't be held liable for them. Potential liability will remain, however, and fall on manufacturers or programmers for any defects in the AI system that causes damage to users. Yes, you read that right — it says "programmers."

A High-profile Case: Facial Recognition and the Congressional Black Caucus

Some of those concerns are reflected in a study conducted by Forrester Consulting for KPMG International. Forrester found that 92 percent of C-level executives are worried about the effect of data and analytics, which includes the use of AI, on their business reputation, and only 35 percent have a high level of trust in their own organization's use of these technologies.

KPMG AI In Control

KPMG Forrester diagram (Source: KPMG International)

One high-profile facial recognition case occurred last year. An ACLU investigation discovered — in testing Amazon's face recognition tool, Rekognition — that the tool incorrectly identified 28 members of Congress. In the case, Rekognition “decided” that the Congress members were completely different people who had been arrested for committing crimes. The Congress members, a mix of democrats and republicans, had been incorrectly matched with a mug shot database. A disproportionate number of the false matches, nearly 40 percent, were people of color (even though people of color only make up only 20 percent of Congress); the mismatched group included six members of the Congressional Black Caucus.

"To conduct our test, we used the exact same facial recognition system that Amazon offers to the public, which anyone could use to scan for matches between images of faces," wrote Jacob Snow, technology & civil liberties attorney for the ACLU of Northern California, in a blog about the experiment. "Using Rekognition, we built a face database and search tool using 25,000 publicly available arrest photos. Then, we searched that database against public photos of every current member of the House and Senate. We used the default match settings that Amazon sets for Rekognition."

The ACLU is concerned that if law enforcement uses Rekognition, police officers could get false matches indicating, for example, that an individual has a previous concealed-weapon arrest — which could unfairly bias the officer before there's even an encounter. "The results demonstrate why Congress should join the ACLU in calling for a moratorium on law enforcement use of face surveillance," Snow wrote. After this test was revealed, there was a huge outcry: over 400 members of academia, nearly 70 civil rights groups, more than 150,000 citizens, and Amazon employees and shareholders demanded that Amazon stop selling face surveillance technology to police departments.

ACLU fals ematches

The ACLU’s experiment with Amazon’s Rekognition face recognition tool found that it incorrectly identified these 28 members of Congress, deciding that they were completely different people who had been arrested for committing crimes. The tool matched these members of Congress (a mix of Democrats and Republicans) with a mug shot database. Nearly 40 percent of the false matches were people of color, even though they make up only 20 percent of Congress. (Source: ACLU)

More recently, more than 50 AI researchers from academia and industry have signed an open letter asking Amazon to stop selling its face recognition technology to law enforcement.

"One of the trickiest parts about algorithmic bias is that engineers don't have to be actively racist or sexist to create it," Megan Garcia pointed out in her article, How to Keep Your AI from Turning into a Racist Monster. "In an era when we increasingly trust technology to be more neutral than we are, [algorithmic bias creates] a dangerous situation. As the tech industry begins to create AI, it risks inserting racism and other prejudices into code that will make decisions for years to come. And since deep learning means that code, not humans, will write code, there's an even greater need to root out algorithmic bias."

These examples indicate that bias (especially societal biases about race, gender, age, or geography) can be unintentional and can in turn become, and produce, unintentional discrimination when automated inside of a black box. None of this gives much confidence in the readiness of the technology, by itself, for making automated decisions that are fair, or even accurate. All of it at least gives pause to consider what type of aids machine learning needs (Better data sets? More finely tuned algorithms? Auditing processes?) to ensure that the biases of our past aren't embedded in decisions that will be made now and in the future.

 

13 Horror Stories of AI Gone Bad

AI systems are still often not what we expect them to be. Sometimes they’re horribly biased. Sometimes they’re just plain wrong and don’t do what we think we told them to do. Here are 13 horror stories for your contemplation.


1. Apple face-recognition blamed by N.Y. teen for false arrest — Teenager sues Apple for $1 billion, claiming facial recognition used in its stores led to his false arrest because it misidentified him as a real thief. Apple says it doesn’t use facial recognition in its stores. Whom do you believe? (April 22, 2019)

 
2. Researchers trick Tesla Autopilot into steering into oncoming traffic — No, they didn’t hack the system. They used little stickers to make a fake “lane” that fooled the AI into changing lanes. This isn’t AI bias, just AI stupidity. (April 1, 2019)
 
3. HUD files charges against Facebook over housing ad discrimination — HUD says Facebook violated the Fair Housing Act by letting landlords and home sellers decide who can see their ads, based on race, religion, sex, disability, and other factors. (March 28, 2019)
 
4. AI mistakenly “sees” cancer in medical scans after tiny image tweaks  This isn’t AI bias, either, just more AI stupidity in misclassifying data. (March 21, 2019)
 
5. Why Facebook search suggests “photos of female friends in bikinis” — A “Sexist” search bug says more about us than Facebook. Apparently no one searching for photos of women on Facebook is actually looking for friends who are female. Even if they were, the photos that turn up would show the women in bikinis, because that’s what most people — apparently mostly men — are looking for. (February 22, 2019)
 
6. An AI that writes convincing prose risks mass-producing fake news — “Risks” sounds like an overly conservative estimate, since we already know that bots are writing fake news stories. This story does show that they’re getting a lot better at it. (February 14, 2019)
 
7. Police across the US are training crime-predicting AIs on falsified data — Cops are using predictive policing algorithms more  frequently. But they’re often using very dirty data. (February 13, 2019)
 
8. Fairness under unawareness: assessing disparity when “protected class” is unobserved — Algorithms are guessing the race of mortgage loan applicants — which they can often get wrong — to decide whether they can buy a house. Even the Consumer Financial Protection Bureau is using bad models with audit lenders. (January 29, 2019)
 
9. All automated hiring software is prone to bias by default — Most of the algorithms used for hiring are biased. (December 13, 2018)
 
10. Amazon’s face recognition falsely matched 28 members of Congress with mugshots — Rekognition incorrectly identified 28 members of Congress, deciding that they were completely different people who had been arrested for committing a crime.  Nearly 40 percent of the false matches were people of color, even though they make up only 20 percent of Congress. (July 26, 2018)
 
11. Self-driving Uber car kills pedestrian in Arizona, where robots roam — Not, not, not ready for prime time. (March 19, 2018)
 
12. Google’s speech recognition has a gender bias — YouTube’s auto captions consistently performed better on male voices than female voices. Unfortunately, not exactly a new thing in voice recognition. (July 12, 2016)
 
13. Tay tweets: Microsoft shuts down AI chatbot turned pro-Hitler racist troll after just 24 hours — What’s sad is that this was supposed to be a demo of how great the tech is. Nope. (March 24, 2016)

 

Subscribe to Newsletter

Test Qr code text s ss