Correlation, Causation, and the Weight of Evidence

SUMMARY: One can only speculate on the reasons why some might still wish to cling to the self-selection bias hypothesis in the face of all the evidence to date. It seems almost a matter of common sense that making articles more accessible to users also makes them more usable and citable -- especially in a world where most researchers are familiar with the frustration of arriving at a link to an article that they would like to read (but their institution does not subscribe), so they are asked to drop it into the shopping cart and pay $30 at the check-out counter. The straightforward causal relationship is the default hypothesis, based on both plausibility and the cumulative weight of the evidence. Hence the burden of providing counter-evidence to refute it is now on the advocates of the alternative.

Jennifer Howard ("Is there an Open-Access Advantage?," Chronicle of Higher Education, October 19 2010) seems to have missed the point of our article. It is undisputed that study after study has found that Open Access (OA) is correlated with higher probability of citation. The question our study addressed was whether making an article OA causes the higher probability of citation, or the higher probability causes the article to be made OA.

The latter is the "author self-selection bias" hypothesis, according to which the only reason OA articles are cited more is that authors do not make all articles OA: only the better ones, the ones that are also more likely to be cited.


The Davis et al study tested this by making articles -- 247 articles, from 11 biology journals -- OA randomly, instead of letting the authors choose whether or not to do it, self-selectively, and they found no increased citation for the OA articles one year after publication.

But almost no one finds that OA articles are cited more a year after citation. The OA citation advantage only becomes statistically detectable after citations have accumulated for 2-3 years.

Even more important, Davis et al. did not test the obvious and essential control condition in their randomized OA experiment: They did not test whether there was a statistically detectable OA advantage for self-selected OA in the same journals and time-window. You cannot show that an effect is an artifact of self-selection unless you show that with self-selection the effect is there, whereas with randomization it is not. All Davis et alshowed was that there is no detectable OA advantage at all in their one-year sample (247 articles from 11 Biology journals); randomness and self-selection have nothing to do with it.

Davis et al released their results prematurely. We are waiting to hear what Davis finds after 2-3 years, when he completes his doctoral dissertation. But if all he reports is that he has found no OA advantage at all in that sample of 11 biology journals, and that interval, rather than an OA advantage for the self-selected subset and no OA advantage for the randomized subset, then again, all we will have is a failure to replicate the positive effect that has now been reported by many other investigators, in field after field, often with far larger samples than Davis et al's.

Meanwhile, our study was similar to that of Davis et al's, except that it was a much bigger sample, across many fields, and a much larger time window -- and, most important, we did have a self-selective matched-control subset, which did show the usual OA advantage. Instead of comparing self-selective OA with randomized OA, however, we compared it with mandated OA -- which amounts to much the same thing, because the point of the self-selection hypothesis is that the author picks and chooses what to make OA, whereas if the OA is mandatory (required), the author is not picking and choosing, just as the author is not picking and choosing when the OA is imposed randomly.

And our finding is that the mandated OA advantage is just as big as the self-selective OA advantage.

As we discussed in our article, if someone really clings to the self-selection hypothesis, there are some remaining points of uncertainty in our study that self-selectionists can still hope will eventually bear them out: Compliance with the mandates was not 100%, but 60-70%. So the self-selected hypothesis has a chance of being resurrected if one argues that now it is no longer a case of positive selection for the stronger articles, but a refusal to comply with the mandate for the weaker ones. One would have expected, however, that if this were true, the OA advantage would at least be weaker for mandated OA than for unmandated OA, since the percentage of total output that is self-archived under a mandate is almost three times the 5-25% that is self-archived self-selectively. Yet the OA advantage is undiminished with 60-70% mandate compliance in 2002-2006. We have since extended the window by three more years, to 2009; the compliance rate rises by another 10%, but the mandated OA advantage remains undiminished. Self-selectionists don't have to cede till the percentage is 100%, but their hypothesis gets more and more far-fetched...

The other way of saving the self-selection hypothesis despite our findings is to argue that there was a "self-selection" bias in terms of which institutions do and do not mandate OA: Maybe it's the better ones that self-select to do so. There may be a plausible case to be made that one of our four mandated institutions -- CERN -- is an elite institution. (It is also physics-only.) But, as we reported, we re-did our analysis removing CERN, and we got the same outcome. Even if the objection of eliteness is extended to Southampton ECS, removing that second institution did not change the outcome either. We leave it to the reader to decide whether it is plausible to count our remaining two mandating institutions -- University of Minho in Portugal and Queensland University of Technology in Australia -- as elite institutions, compared to other universities. It is a historical fact, however, that these four institutions were the first in the world to elect to mandate OA.

One can only speculate on the reasons why some might still wish to cling to the self-selection bias hypothesis in the face of all the evidence to date. It seems almost a matter of common sense that making articles more accessible to users also makes them more usable and citable -- especially in a world where most researchers are familiar with the frustration of arriving at a link to an article that they would like to read (but their institution does not subscribe), so they are asked to drop it into the shopping cart and pay $30 at the check-out counter. The straightforward causal relationship is the default hypothesis, based on both plausibility and the cumulative weight of the evidence. Hence the burden of providing counter-evidence to refute it is now on the advocates of the alternative.

Davis, PN, Lewenstein, BV, Simon, DH, Booth, JG, & Connolly, MJL (2008) Open access publishing, article downloads, and citations: randomise... , British Medical Journal 337: a568

Gargouri, Y., Hajjem, C., Lariviere, V., Gingras, Y., Brody, T., Carr, L. and Harnad, S. (2010) Self-Selected or Mandated, Open Access Increases Citation Impact fo.... PLOS ONE 10(5) e13636

Harnad, S. (2008) Davis et al's 1-year Study of Self-Selection Bias: No Self-Archivin..., No OA Effect, No Conclusion. Open Access Archivangelism July 31 2008

Creative Commons License Please feel free to re-use to promote OA. All Open Access Week postings by Stevan Harnad are licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 Canada License. Most are based on works at openaccess.eprints.org.

Views: 47

Comment

You need to be a member of Open Access Week to add comments!

Join Open Access Week

All content on this site is released under CC-By unless specified differently by poster   Created by Andrea Higginbotham.

Badges  |  Report an Issue  |  Terms of Service