Some questions to those that oppose open data

The recent announcement by PLoS on the requirement to post the data behind a paper has pushed a discussion on the issue of data, authorship and ownership. The opposed-to camp seem to focus on two issues: fear of being “scooped” and the additional effort of organizing a dataset. Some also focus on being owners of the data, even when they did not fund the research.

I have a few questions to those in the opposed-to camp:

    • Do you see someone that uses your data to work on an idea in which you have no expertise, experience, knowledge, or interest, but that pushes the field forward, as a “thief” (as some have done already)?
    • What about the problem of imperialistic science? In tropical ecology many researchers are from the US or Europe, but they collect data in the tropics and take it back to their labs, leaving the local researchers, students, managers, and citizens out of the loop. Forcing the publication of the data will help a bit to solve this problem.
    • Is there a proposed alternative to open data in public archives?
      • If we keep the status quo and have to contact the author to even see the data, what do you propose when the authors do not answer inquiries or just plain refuse?
      • I had a case where a paper reached some conclusions based on some patterns, but there was a factor that they did not consider. I could not get anything useful from the figures because the authors had aggregated the data. I never got an answer because I have criticized the authors in the past. What can be done if I think a paper missed an important factor, or may just be wrong, but the authors refuse to even let me see the data? Would your ego allow you to share data with those that have criticized you in the past? We all know some that would not.
    • Wouldn’t you benefit if someone finds an error in your paper before you keep working on that same dataset based on erroneous conclusions? (leaving egos aside, if possible)
    • It is standard in some fields to publish the data. What makes your field/dataset different?
    • Curators of natural history museums are not authors when a paper is based on their museum’s collection just because the data came from it. Why should your data be any different?
    • Imagine the insights we could get to if we could analyze the data from deceased researchers using new methods or to add to a dataset. How do we make this happen? Should we expect their family, who may not be privy to the details of the dataset or even have worked in science, to do the work of curating a dataset they know nothing about? Would a student or colleague do all the work needed for no reward?
    • Yes, any system will have freeloaders, but are they a nuisance or a huge problem that outweighs the benefits?

    I can understand some of the objections, but I think we can all benefit from open data. In summary, where do we go from here?

This entry was posted in Open Science, Science. Bookmark the permalink.
  • Laura (@MicroWavesSci)

    Hi Luis.

    Thanks for your post. You make some good points about the benefits of data sharing/archiving.

    As far as “where do we go from here?”, I think it would be ideal to collect data on data sharing. It would be valuable to know how many times a raw dataset is downloaded, how many publications result from the raw dataset after its release (from the data collector vs others), the level of interaction between data collector and those using the dataset, how often use of a raw dataset translates into a meaningful academic credit (coauthorship, acknowledgment, citation). It would also be valuable to collect data on the level of support from funding agencies for data re-analysis versus data collection proposals.

    At the moment, much of the discussion centers on anecdotal observations and generalizations made from individual fields or institutional types. Collecting hard data on data sharing/archiving could help address the questions and concerns of those who see open data primarily as an obstacle. It could also illustrate unintended negative impacts from data sharing policies and inform changes to those policies.

    Thanks again for your post,

  • Luis J. Villanueva

    This is a good point. I think therefore that researchers should prefer to use sites like Dryad, FigShare, or their institution’s data archive if they provide DOIs. This will make it easier to track than when just posting the data on the lab’s website.