A growing number of studies provide evidence that editors (and referees) of academic journals often publish only findings showing a significant effect or surprising result. This bias might lead to a misrepresentation of public policies’ real effects in the published literature.

There has been increasing evidence in the social sciences that publication bias, p-hacking, and lack of reproducibility are real concerns. Publication bias occurs if papers that have certain characteristics (statistically significant or surprising) are more likely to be published, whereas p-hacking would occur if a researcher is consciously manipulating data in a way that produces a desired p-value. P-value is a measure of the statistical likelihood that the hypothesis being tested is true. If the p-value is low, an effect is highly unlikely to be due to luck. An example of p-hacking is a situation in which a researcher selects data or statistical analyses until her/his nonsignificant results become significant.

The issues of publication bias and p-hacking make published research less credible to policymakers and citizens. If policymakers and citizens only see a subset of researchthat is, findings showing a significant effect or surprising resultthen it is unclear how much faith they should have in said research. In other words, if studies finding a significant effect of a given policy are the only ones getting published, then this would lead to a misrepresentation of the policy’s real effect in the published literature. Another related issue is that p-hacking may make research less reproducible.

For example, imagine a researcher is interested in the effect of changes in the federal or state minimum wage on unemployment. The investigator has a large data set and may estimate many different statistical models. There is thus a large set of specifications available to the researcher, who then chooses to present only a subset of the results that he or she finds. The researcher would be p-hacking if she or he will select different covariates (or sub-samples) with the purpose of moving a test statistic across a statistical threshold. P-hacking, in this example, would lead the researcher to conclude that the minimum wage has an effect on unemployment, when in fact there might be no real underlying effect.

During my PhD, I started working on a paper titled Star Wars: the Empirics Strikes Back. In this project, my co-authors and I were interested in documenting the extent of p-hacking and publication bias in economics. We collected all the p-values published in three of the most prestigious journals in economics and show a strong empirical regularity: the distribution of p-values has a two-humped camel shape with a first hump for high p-values, missing p-values between 25 percent and 10 percent, and a second hump for p-values slightly below 5 percent.

This anomaly provides suggestive evidence that some p-values were p-hacked.

Source: Figure from Brodeur et al., 2016. This figure displays histograms of test statistics from the American Economic Review, the Journal of Political Economy and the Quarterly Journal of Economics for the years 2005-2011.

Researchers p-hack for many reasons. One explanation is that they are subject to increasing pressure to publish their results in top journals and that they believe that significant results are easier to publish.

In order to improve our understanding of research transparency, we go beyond documenting publication bias and attempt to shed light on the sub-literatures that suffer the most from these biases. We relate the extent of p-hacking to authors’ and articles’ characteristics and find that the extent of p-hacking correlates with incentives to get published: lower for older and tenured professors compared with younger researchers.

The extent of p-hacking also correlates with the importance of the empirical result in the publication prospects. In theoretical papers, the empirical analysis is less crucial, and, indeed, the misallocation is much lower. Moreover, the two-humped camel shape is less visible in articles using data from randomized control trials than for non-experimental methods. This suggests that experimental methods could be less subject to p-hacking, possibly due to the difficulty of manipulating the results.

We also study the effectiveness of recent innovations in research transparency, such as journals’ data and code availability policies. These policies request replication data and codes from authors for published articles, facilitating the replication of the main results by other researchers. As of 2019, slightly more than half of the top economics journals required data and codes from authors.

Our analysis of the different sub-samples does not show conclusive evidence that data or program availability on the website of the journals mitigate p-hacking. But one obvious advantage of having data and codes is the possibility to reproduce the results of the published articles. While this might not have an impact on researchers’ behavior, it increases the likelihood of detecting mistakes.

Another solution recently developed and used in the field of health economics is to send out an editorial statement to researchers in the field. In February 2015, the editors of the Journal of Health Economics released a statement which aimed to reduce the extent of p-hacking and to remind authors to submit studies that do not find a significant effect of a given policy or program.

In a new study, my co-author and I test the effectiveness of this editorial statement. We compare health economics journals to similar journals that did not implement such an editorial statement. We find that this simple, low cost, new transparent practice, had large effects, decreasing the extent of publication bias in the field of health economics. The estimates suggest that the statement decreased the proportion of tests rejecting the null hypothesis by 18 percentage points.

These findings have interesting implications for editors and for the academic community. They suggest that incentives may be aligned to promote more transparent research and that editors may reduce the extent of publication bias quite easily.

Abel Brodeur is an Associate Professor in the Department of Economics at the University of Ottawa. He received his Ph.D. in Economics at the Paris School of Economics.

The ProMarket blog is dedicated to discussing how competition tends to be subverted by special interests. The posts represent the opinions of their writers, not necessarily those of the University of Chicago, the Booth School of Business, or its faculty. For more information, please visit ProMarket Blog Policy.