A University of Virginia psychologist decided in 2011 to find out whether such suspect science was a widespread problem. He and his team recruited more than 250 researchers, identified 100 studies that had each been published in one of three leading journals in 2008, and rigorously redid the experiments in close collaboration with the original authors.
The results are now in: More than 60 of the studies did not hold up. They include findings that were circulated at the time — that a strong skepticism of free will increases the likelihood of cheating; that physical distances could subconsciously influence people’s sense of personal closeness; that attached women are more attracted to single men when highly fertile than when less so.
The new analysis, called the Reproducibility Project and posted Thursday by Science, found no evidence of fraud or that any original study was definitively false. Rather, it concluded that the evidence for most published findings was not nearly as strong as originally claimed.“Less than half — even lower than I thought,” said Dr. John Ioannidis, a director of Stanford University’s Meta-Research Innovation Center, who once estimated that about half of published results across medicine were inflated or wrong. Dr. Ioannidis said the problem was hardly confined to psychology and could be worse in other fields, including cell biology, economics, neuroscience, clinical medicine, and animal research......................
Cue the replication skeptics.Among the studies that did not hold up was one on free will. It found that participants who read a passage arguing that their behavior is predetermined were more likely than those who had not read the passage to cheat on a subsequent test. Another was on the effect of physical distance on emotional distance. Volunteers asked to plot two points that were far apart on graph paper later reported weaker emotional attachment to family members, compared with subjects who had graphed points close together. A third was on mate preference. Attached women were more likely to rate the attractiveness of single men highly when they were highly fertile, compared with when they were less so. In the reproduced studies, researchers found weaker effects for all three experiments.
And psychologists make fun of philosophers because we just engage in thought experiments. Now it is more arguable than ever that the latter yield results as least as reliable, for less money and without the need for an IRB.“There’s no doubt replication is important, but it’s often just an attack, a vigilante exercise,” said Norbert Schwarz, a professor of psychology at the University of Southern California. Dr. Schwarz, who was not involved in any of the 100 studies that were re-examined, said that the replication studies themselves were virtually never vetted for errors in design or analysis.Dr. Nosek’s team addressed this complaint in part by requiring the researchers attempting to replicate the findings to collaborate closely with the original authors, asking for guidance on design, methodology and materials. Most of the replications also included more subjects than the original studies, giving them more statistical power.The numbers told a mixed story. Strictly on the basis of significance — a statistical measure of how likely it is that a result did not occur by chance — 35 of the studies held up, and 62 did not. (Three were excluded because their significance was not clear.) The overall “effect size,” a measure of the strength of a finding, dropped by about half across all of the studies. Yet very few of the redone studies contradicted the original ones; their results were simply weaker.“We think of these findings as two data points, not in terms of true or false,” Dr. Nosek said.The research team also measured whether the prestige of the original research group, rated by measures of expertise and academic affiliation, had any effect on the likelihood that its work stood up. It did not. The only factor that did was the strength of the original effect — that is, the most robust findings tended to remain easily detectable, if not necessarily as strong.