In 2011 Daryl Bem proved that precognition exists! People can know something before it happens. Specifically, he ran an experiment which went like this: students were shown a computer screen displaying two curtains. They were told that a picture would appear behind one of the curtains and they had to predict which curtain. The computer then used a random number generator to choose which side the picture would appear. If the subjects had no ability to know the future, you would expect a 50-50 success rate. But Bem’s experiment showed that subjects predicted successfully at a statistically significant higher rate.
It would have been easy to dismiss this as an improperly run experiment, except that
- Professor Bem was a highly regarded scientist, having initiated a major field of study within psychology
- He had taught at Carnegie-Mellon, Stanford, Harvard, and Cornell Universities
- His study was peer-reviewed and no problems found with the methodology or analysis
and so “Feeling the Future” was published in one of the most prestigious psychology journals.
Consternation! Pretty much no-one believed in precognition, but how could “good”, peer-reviewed, published science, be totally wrong? Several scientists tried repeating Bem’s study and found no precognition. And others started looking at whether some well-accepted studies were replicable and found that they, too, didn’t hold up. Some estimates suggested that as many as 50% of published results are not repeatable. And so began the so-called Replication Crisis in psychology.
Sadly, some of the studies that have not held up include some great ideas. You know about them if you’ve read wonderful books like Nobel laureate Daniel Kahneman’s Thinking Fast and Slow or Dan Ariely’s Predictably Irrational. Examples are:-
Creative Commons Licence
Social Priming: A famous study found that after exposing subjects to some random words which included ones relating to old age, they walked more slowly when leaving the building.
Ego depletion: This 1996 experiment has been cited over 3,000 times. Student volunteers were placed in a room with freshly baked, fragrant chocolate chip cookies. Some groups were permitted to eat them, other groups not. After a while they were given an impossible puzzle to try and solve. Those whose will-power had been tested by refraining from eating gave up after eight minutes. Those who had been allowed to eat the cookies lasted an average of 19 minutes.
Facial Feedback Hypothesis: A highly cited experiment in 1988 gave subjects a pen to hold in their mouths while looking at cartoons. Those who were instructed to hold the pen in a way to force them to smile, found the cartoons significantly funnier than the control group.
There are many reasons that invalid results can be published and subsequently not refuted. Some are
- Significance level: Psychology accepts as significant a 95% confidence level. And if the conclusion is true 95% of the time, you would expect it to be wrong once in every 20 studies. By contrast the benchmark for Physics is one spurious conclusion in 3.5 million!
- Journals will only publish positive results, so there’s a lot of pressure on researchers to find significance.
- Journals are also very reluctant to publish papers showing that studies were not successfully replicated. And if the replication works, it's old news! So there’s little incentive to try to replicate a study. It won’t get you a publishable paper no matter whether the replication is successful or unsuccessful.
- “P-hacking”. You collect a lot of data and then search it for a significant result. If you slice and dice your data enough, you are quite likely to come up with a statistically significant result, purely by chance. So if you don’t get significance with your entire group of subjects, try splitting them into subgroups of males and females; or try young and old; or try gay and straight; or first-born vs other birth sequence, etc. In Bem’s study, he found that precognition worked with erotic images and male subjects. If you have 20 subgroups, you should have a good chance of finding a spuriously significant result.
- “Data Peeking”. This is a way of selectively collecting the data you would like. You run a study on, say, 20 subjects and look at the results. If they don’t look good, decide that the procedure isn't quite right. Throw away that 'bad' data, make a small change and try again. Keep doing this until you get a really good result on 20 subjects. Then declare those 20 to be the start of the real study and keep going. It has been suggested that Bem may have done this. It’s clearly dishonest if you know that the procedural changes you are making are not significant. But there’s a widespread belief that in many cases minor changes to the procedure are significant. (Perhaps just convenient wishful thinking).
Lots of psychologists are trying hard to fix the many problems in the discipline.
- There are many groups working to replicate and either support or debunk existing theories.
- There’s a lot of pressure to “preregister” studies. This means you specify in advance what your hypothesis is and what data you’ll test and analyze. It leaves less wiggle room for data hacking after the fact.
- There’s pressure on journals to agree to publish Registered Reports. These would be well-designed studies and the journal agrees in advance to publish the results, regardless of the outcome.
In the meantime, I plan to keep smiling and eating chocolate chip cookies.