Uber and the Sherlock Holmes Principle: How Control of Data Can Lead to Biased Academic Research

How can we ensure that academic journals do not become an unintended instrument in the PR efforts of powerful firms? Luigi Zingales offers some solutions.

“Our data science team reviewed this report and found it to be flawed.” So said an Uber spokesperson upon the release of a Stigler Center working paper, which showed that when Uber and Lyft entered a city, traffic fatalities increased by 3 percent. Overall, that equates to almost 1000 more traffic-related deaths per year.

You can see why Uber would want to dismiss this study. Indeed, the spokeswoman went on to cite its “problematic methodology” and “unjustified conclusions” but offered no analysis to support her assertions.

This is but one example of a much bigger issue that is not discussed publicly, and which has very real consequences, particularly in our data-driven world. Companies control their data. They tend to share that data with researchers who use it only in ways that are blessed by the corporations. Thus, important questions aren’t answerable, or worse, the apparent answers might be biased or incomplete.

Is there a solution?

The Uber/Lyft imbroglio first came to my attention a year ago, when I reviewed the Stigler Center working paper. The study, by John Barrios and co-authors, was only correlational, because, without access to Uber or Lyft’s data, Barrios et al. could not determine whether this increase in fatalities was simply the result of increased traffic or an inferior quality of ride-hailing drivers. Furthermore, they could not (and did not) even claim that the observed correlation was necessarily the result of a causal link between ride-hailing and accidents. To prove causality, one would need an improbable experiment where cities are randomly assigned to a treatment group (the entry of ride-hailing) and a control group. Yet, the authors did their best to dispel possible alternative explanations. In sum, the study raised an important question: What are the externalities produced by ride-hailing?

Although Uber called the methodology “flawed,” it was not. It was the best analysis possible with the data available. For this reason, in my review I invited Uber and Lyft to share their data with the authors of the controversial study, to give them an opportunity to test their hypotheses. With proprietary data, the authors could verify whether the increased fatality was correlated with an increased number of miles driven by ride-hailing services. They could even test whether ride-hailing cars were involved in more accidents.

“Is it just a coincidence that studies to which Uber grants data tend to enhance its public image?”

A year went by with no response. Instead, Uber and Lyft commissioned Fehr & Peers, a transportation consultancy firm, to conduct an analysis of the traffic generated by their services. The Fehr & Peers report found that Uber and Lyft represent between two and 13 percent of all vehicle miles traveled (VMT) in six metropolitan cities. While this study does not directly answer the questions raised by Barrios et al., it certainly makes their results more credible. For example, the Fehr & Peers report finds that Uber and Lyft cars drive empty half of the time. Thus, if people use Uber instead of their cars, traffic will increase. Can this surge in traffic alone explain the increase in fatal accidents?

Unfortunately, the Fehr & Peers report does not address this question. Since consultants respond to the requests of their clients, it is clear that neither Uber nor Lyft wanted to pay Fehr & Peers to answer the question raised by Barrios and his co-authors. Yet, Barrios and co. would have provided this answer for free. Why didn’t Uber or Lyft want to share their data?

It’s not because these companies don’t partner with academics; the prestigious Journal of Political Economy (an academic journal that counts among its editors Uber’s former chief economist, now Lyft’s chief economic adviser) just accepted a paper that two academic economists co-authored with past and current Uber employees, using Uber’s proprietary data. Why is it that Uber sometimes grants access to data and at other times does not? It would be tempting to answer by looking at the content of the paper: it documents how Uber’s flexible schedule more than doubles the surplus enjoyed by its employees, a result that greatly enhances Uber’s public image and its lobbying efforts to stave off labor regulation.

Yet, this conclusion would be unfair. The paper is very well executed and the results are very interesting and credible. Furthermore, as the authors disclose in a footnote, “Uber has the right to review the paper ‘solely to confirm that confidential information is being represented in a non-misleading fashion’ but not to dispute or influence the findings or conclusions of the paper.” So, is it just a coincidence that studies to which Uber grants data tend to enhance its public image?

No. Anybody with any economic training can appreciate that a study about the welfare benefits of a flexible work schedule can hardly produce a paper that damages Uber’s image. In the best scenario, it can lead to a big positive result (like more than doubling workers’ surplus), which will be published in a top journal (such as the Journal of Political Economy) and help Uber’s case tremendously, both in the court of public opinion and, possibly, in a court of law. Worst case scenario, the welfare benefits will be small and the paper will be uninteresting. It will not be published in any major journal and will be ignored, and Uber would lose nothing. Hence, the company’s eagerness to share the data. By contrast, analyzing the traffic fatalities Uber’s drivers are involved in can only lead to problems for Uber, hence its resistance to this particular direction.

“In studying digital platforms academic research inadvertently becomes part of the platforms’ lobbying effort, a phenomenon I labeled academic capture.“

Some people might see no problem here. After all, under current US law, Uber owns its data; why shouldn’t it use them as it sees fit? This question transcends Uber and Lyft: it applies to all firms, but in particular, it applies to digital platforms. To study the welfare effects of these new platforms, we need data. Yet, these data are granted to independent researchers only when the company that owns them expects to benefit from the answer. As a result, the empirical evidence regarding the welfare effects of these digital platforms is severely biased. Ironically, it is severely biased even when all the scholars involved exhibit the maximum level of professional integrity, as is the case here.

In sum, in studying digital platforms, academic research inadvertently becomes part of the platforms’ lobbying effort, a phenomenon I labeled academic capture for its similarity with the better-known term regulatory capture. As the digital economy becomes more pervasive, so does this problem. What can collectively be done to address it, since no single scholar can fix the problem by acting alone?

One solution, proposed by the Stigler Center’s Committee on Digital Platforms, is to create a Digital Authority with the power to force data sharing. This Digital Authority would conduct studies itself and ensure that independent researchers have access to the data without being restricted in the questions they ask.

For those who do not like the regulatory approach, there is also a private litigation solution: an inversion of the burden of the proof, certainly in the court of public opinion, but possibly even in a court of law. Take the case discussed above: Since Uber and Lyft have the data to disprove the correlational analysis of Barrios et al. and they failed to do so within a year, one has to conclude that the correlational analysis is correct and that ride-hailing increases fatal accidents. This logic is straight from Sherlock Holmes in The Adventure of Silver Blaze: if the dog did not bark, it must be that it had no reason to bark, i.e. it knew the thief. If Uber and Lyft did not show any proof to the contrary when they had both motives and opportunities to do so, it must be because they know they’re wrong.

Sherlock Holmes and Watson in a Sidney Paget illustration for “The Adventure of Silver Blaze” [Public domain]

This inversion of the burden of the proof will protect the privacy of firms’ data, while at the same time creating incentives for an unbiased disclosure of the relevant facts. It would also restrain companies from diffusing false information. Had Uber and Lyft been public companies at the time that the above statements were made, the inversion of the burden of the proof would have greatly facilitated a legal suit for misrepresentation. If one thinks this inversion is too much of a burden for most companies, it’s easy to carve out an exemption for companies that voluntarily share their data with independent researchers, without any restriction on the set of topics that are investigated.

Given the political power of digital platforms, I am not holding my breath that any of these changes will take place soon. Yet, the responsibility is not only in the hands of our elected representatives. Academia plays an important role in the lobbying process, a role we often fail to acknowledge. For this reason, the Stigler Center is co-sponsoring a conference on “academic lobbying” at Columbia University this December. We should discuss what we can do to change the current state of affairs: If we are not part of the solution, we become part of the problem. But what can we do?

Academic journals, for one, could stop accepting papers produced with proprietary data. After all, if the hallmark of scientific research is reproducibility and access to these data is restricted to only a few individuals hand-picked by the companies themselves, how can we ensure reproducibility? If this measure is too harsh because we lose important knowledge, how can we ensure that academic journals do not become an unintended instrument in the PR efforts of powerful firms?

While this problem is most severe for digital platforms, it is not limited to them alone, nor is it limited to private firms. The Federal Reserve does even worse than Uber: it screens working papers for content before releasing them. In other fields, the problem is even more severe than in economics. The editor of a medical journal recently rejected a paper—despite favorable peer review—because the paper “went beyond what our marketing department was willing to accommodate.” In agriculture, litigation has revealed internal emails that show Monsanto was ghostwriting “independent” research on the safety of its product, which was eventually published (without proper disclosure) in major academic journals.

The pervasiveness of the problem should not make us complacent. In fact, it should worry us more. The recent Edelman Trust Barometer showed that academic experts still enjoy a high level of trust—63 percent vs. only 36 percent for journalists and 35 percent for government officials— because they are considered less ideologically biased than politicians and are not pressured by their employers to shape the narrative, as all-too-often happens with journalists. But how long can this reputation survive? Criticism is increasing and if we do not clean up the process, trust in academics risks reaching the levels of trust in journalists and politicians. It would not just be a loss for us, but a huge loss for society at large.

To work, democracy needs some shared trust regarding the reliability of information in the public domain. Traditional media used to play this role. They have since lost it because—to survive economically—many of them have had to cater to a niche of dedicated readers, thus increasing their ideological bias. This might guarantee economic sustainability, but not credibility. If academia loses credibility as well, who will play that crucial role?

The ProMarket blog is dedicated to discussing how competition tends to be subverted by special interests. The posts represent the opinions of their writers, not necessarily those of the University of Chicago, the Booth School of Business, or its faculty. For more information, please visit ProMarket Blog Policy.

Uber and the Sherlock Holmes Principle: How Control of Data Can Lead to Biased Academic Research

Luigi Zingales

Popular This Week

How Bans on Corporate Political Donations Influence Campaign Platforms

How Competition Has Increased Fraud in Medicare’s DME Program

Meta’s Winning Market Definition in Its Monopoly Case Relied on a Flawed Empirical Assumption

Smart Contracts Are Shifting Property Rights and Risk

LATEST NEWS

How Competition Has Increased Fraud in Medicare’s DME Program

How Bans on Corporate Political Donations Influence Campaign Platforms

Smart Contracts Are Shifting Property Rights and Risk

The Myth That Lobbying and Campaign Giving Are the Same

Meta’s Winning Market Definition in Its Monopoly Case Relied on a Flawed Empirical Assumption

Everyone Wants Competition. Few Ask What Kind

Sharing a Leader With Your Rival Firm Increases Odds of Collusion

Do Pharmaceutical Acquisitions Undermine Innovation by Disrupting Human Capital?

Mass Shootings Do Not Change How US Politicians Vote on Gun Policy

AI Transforms Search in a Way That Could Make Google’s Default Advantage Stronger