Wednesday, December 6, 2017

Another thing that probably doesn't exist: IAT validity

For some time now, we all have come to know that we harbor unconscious racist attitudes, which can be measured by the Implicit Attitude Test (IAT). But it turns out that there is little reason to believe that this test is a valid measure of anything, and a lot of reason to disbelieve it. Olivia Goldhill delves into this in Quartz:
There are various psychological tests purporting to measure implicit bias; the IAT is by far the most widely used. When social psychologists Banaji (now at Harvard University) and Anthony Greenwald of the University of Washington first made the test public almost 20 years ago, the accompanying press release described it as revealing “the roots of” unconscious prejudice in 90-95% of people. It has been promoted as such in the years since then, most vigorously by “Project Implicit,” a nonprofit based at Harvard University and founded by the creators of the test, along with University of Virginia social psychologist Brian Nosek. Project Implicit’s stated aim is to “educate the public about hidden biases”; some 17 million implicit bias tests had been taken online by October 2015, courtesy of the nonprofit.
There are more than a dozen versions of the IAT, each designed to evaluate unconscious social attitudes towards a particular characteristic, such as weight, age, gender, sexual orientation, or race. They work by measuring how quick you are to associate certain words with certain groups.
The test that has received the most attention, both within and outside psychology, is the black-white race IAT. It asks you to sort various items: Good words (e.g. appealing, excellent, joyful), bad words (e.g. poison, horrible), African-American faces, and European-American faces. In one stage (the order of these stages varies with each test), words flash by onscreen, and you have to identify them as “good” or “bad” as quickly as possible, by pressing “i” on the keyboard for good words and “e” for bad words. In another stage, faces appear, one at a time, and you have to identify them as African American or European American by pressing “i” or “e,” respectively.
The slower you are and the more mistakes you make when asked to categorize African-American faces and good words using the same key, the higher your level of anti-black implicit bias—according to the test.
OK, so far, so good. My guess is most of us have taken a version of this online.But many institutions are using this test to structure anti-impicit-bias training.
HR departments quickly picked up the theory, and implicit-bias workshops are now relied on by companies hoping to create more egalitarian workplaces. Google, Facebook, and other Silicon Valley giants proudly crow about their implicit-bias trainings. The results are underwhelming, at best. Facebook has made just incremental improvements in diversity; Google insists it’s trying but can’t show real results; and Pinterest found that unconscious bias training simply didn’t make a difference. Implicit bias workshops certainly didn’t influence the behavior of then-Google employee James Damore, who complained about the training days and wrote a scientifically ill-informed rant arguing that his female colleagues were biologically less capable of working at the company.
Silicon Valley companies aren’t the only ones working on their “implicit bias” problem. Police forces, The New York Times, countless private companies, US public school districts, and universities such as Harvard have also turned to implicit-bias training to address institutional inequality.
The problems with this? Two, actually. One, there is no evidence that these tests measure anything, and second, the workshops and training don't seem to be effective in reducing bias. Regarding the latter:
The latest scientific research suggests there’s a very good reason why these well-meaning workshops have been so utterly ineffectual. A 2017 meta-analysis that looked at 494 previous studies (currently under peer review and not yet published in a journal) from several researchers, including Nosek, found that reducing implicit bias did not affect behavior. “Our findings suggest that changes in measured implicit bias are possible, but those changes do not necessarily translate into changes in explicit bias or behavior,” wrote the psychologists.
What about the tests themselves?
In recent years, a series of studies have led to significant concerns about the IAT’s reliability and validity. These findings, raising basic scientific questions about what the test actually does, can explain why trainings based on the IAT have failed to change discriminatory behavior.
First, reliability: In psychology, a test has strong “test-retest reliability” when a user can retake it and get a roughly similar score. Perfect reliability is scored as a 1, and defined as when a group of people repeatedly take the same test and their scores are always ranked in the exact same order. It’s a tough ask. A psychological test is considered strong if it has a test-retest reliability of at least 0.7, and preferably over 0.8.
Current studies have found the race IAT to have a test-retest reliability score of 0.44, while the IAT overall is around 0.5 (pdf); even the high end of that range is considered “unacceptable” in psychology. It means users get wildly different scores whenever they retake the test.
Part (though not all) of these variations can be attributed to the “practice effect”: it’s easy to improve your score once you know how the test works. Psychologists typically counter the influence of “practice effects” by giving participants trial sessions before monitoring their scores, but this doesn’t help the IAT. Scores often continue to fluctuate after multiple sessions, and such a persistent practice effect is a serious concern. “For other aspects of psychology if you have a test that’s not replicated at 0.7, 0.8, you just don’t use it,” says Machery.
The second major concern is the IAT’s “validity,” a measure of how effective a test is at gauging what it aims to test. Validity is firmly established by showing that test results can predict related behaviors, and the creators of the IAT have long insisted their test can predict discriminatory behavior. This point is absolutely crucial: after all, if a test claiming to expose unconscious prejudice does not correlate with evidence of prejudice, there’s little reason to take it seriously.
Bottom line: while we might all be infected with unconscious racist bias, the evidence does not support the claim that the IAT measures it, or that workshops structured around the IAT help mitigate it.

1 comment: