They call it p-hacking. Imagine one day inspiration strikes and you set out to prove that sushi can improve academic performance. You assemble the lucky volunteers and month after month make sure the rolls are delivered to their doorsteps. Come winter, all giddy with anticipation, you inquire about the performance of your subjects during the finals. Alas, there doesn’t seem to be any correlation between rolls consumed and grades earned. Sweating, you begin to realize the months of your time and the grant money have been wasted, as no journal cares about a negative result.
Only wait! Maybe there’s hope. You suddenly notice that if you restrict your data to just freshmen, the correlation emerges, moreover, if you narrow it down to caucasian freshmen from states that start with a consonant, the cognitive enhancing power of sushi becomes undeniable! Thank goodness, you think, and sit down to start working on what in a year or two is going to be a peer-reviewed journal article that contains a blatant falsehood.
P-hacking is the prime suspect in the replication crisis: the recent discovery that many prominent experiments in social sciences do not hold up when replication is attempted. Studies that have resulted in Nobel Prizes are called into question. The careers of celebrity scientists lay in ruins. Some professors are unsure whether half of their psychology curriculum is even worth teaching.
It is in the backdrop of this maelstrom of doubt that the story of Prof. Brian Wansink, the John S. Dyson Professor of Marketing, takes place. In 2016, he published a blog post describing in detail how, after a failed experiment, he encouraged his graduate student to keep scavenging the collected data for a result: to commit the sin of p-hacking. This post raised outrage in some of the readers, so much so that an investigation into Wansink’s body of work began.
At first glance, this is good news: when the evidence emerged that Wansink’s work might not be statistically sound, the scientific community took it upon itself to investigate the offender. Yet, after observing this story unravel in the course of the last year, I no longer believe this to be the underlying narrative. What started as a healthy response to the problem of p-hacking became a distraction from it, as the quest against Brian Wansink grew to a scale that cannot be explained by a rational intent to better the system.
First of all, you have to understand the level of scrutiny we are talking about here. People went as far as 15 years back scrupulously scanning hundreds of Wansink’s papers and writing up reports listing all found inconsistencies. They wrote letters to journals announcing their findings, and to Cornell urging the administration to launch an investigation. With time, instead of dying down, this movement only strengthened, going as far as publishing and analyzing Wansink’s emails. Just a few days ago, the authors of a cookbook announced themselves victims of Wansink’s work.
This is in contrast to all of Wansink’s colleagues whose work remained untouched by any scrutiny during this period. A great illustration here is the case of Prof. Daryl Bem, psychology, who has published research on psychic powers of premonition (I kid you not) and yet wasn’t subjected to even remotely similar level of critique.
What is problematic here is not the unfair treatment of Brian Wansink. It’s that because of this concentration on a single researcher the overall narrative gets warped. The story isn’t that Brian Wansink is a horribly unethical and ruthless scientist, it’s that social sciences are in trouble. P-hacking is virtually undetectable, which makes it hard to provide hard evidence, and yet there are enough signs that I am confident the practice is widespread and destructive.
What are these signs? First of all, remember that we now have empirical evidence showing suspiciously high unreliability of results in both psychology and food science. Second, Brian Wansink is a prominent figure in his field with about 25,000 citations spanning 250-plus published works. This means that the practices he adopted, good or bad, are highly rewarded by the publication system and academia. Third, it is surprising how direct and careless Wansink is when he originally confesses to p-hacking his data in the original blog post. Even when faced with critique in the comment section, he couldn’t quite grasp why p-hacking is an issue. I find it hard to believe that he could sustain such a level of naive ignorance if he were working in a community of responsible and rigorous scientists.
Another fear I have is that the signal this purge sends out may be detrimental to an already-flawed system. One of the things that separated Wansink from many of his colleagues was having a blog in which he openly discussed his research process, and his subsequent willingness to cooperate with the inquiry and learn from it. Thus the message from this public bashing, especially as it becomes progressively more severe, might not be that bad science gets punished but rather that being open about your research does. And, when you couple that signal with the one you recieve from academia — publish or perish — your choice is going to be to p-hack your data and keep your mouth shut.
A couple of weeks ago The Sun published an editorial urging Cornell administration to launch an investigation into Wansink. For the reasons I tried to explain above, I believe this step would be counterproductive. Such an investigation would only further concentrate the public attention on the Wansink scandal instead of the systemic issues that have caused it, and it would potentially further scare researches away from engaging in open discussions of their work.
A previous version of this column incorrectly used the phrase “salami slicing” and “p-hacking” interchangeably to describe instances of manipulating data sets for a certain statistical outcome. “Salami slicing” was not a correct descriptor and mentions of “salami slicing” have been replaced with “p-hacking.”
Artur Gorokh is a graduate student studying applied mathematics at Cornell University. He can be reached at [email protected] Radically Moderate appears alternate Tuesdays this semester.