Category Archives: Open Science

Will 2012 be the year of open science publishing?

The Research Works Act (HR 3699) seems to have launched a wave of criticism that may finally end the publishers unfair profit of science. This bill pretends to prohibit the federal government, which funds a lot of research, from forcing publishers to make scientific papers available for free. Right now the NIH has a policy that says all funded research must be available for free one year after publication.

The problem lies in that the publishers dictate a, usually ridiculous, price to access research that has been paid for by funds from the government or private foundations. The questions that everyone, scientists and taxpayer, need to ask are:

  • What are they bringing to the table?
  • Why do we have to put up with them?
  • Why are they making a lot of profit from our work while our libraries spend millions and keep cutting subscriptions?

The economics of the current model are not justified, an analysis revealed that we could publish all papers in the world using the PLoS ONE model and costs with just the profits of the two largest publishers: Elsevier and Springer. The HR 3699 has brought a lot of discussion because it will only benefit the publishers, not the scientists and definitely not the taxpayers.

Please join the discussion and contact your Congress representative.

Some other articles of interest:

Publish your computer code: it is good enough

I just found this column in Nature discussing the need for scientists to publish the code they used. I’m still amazed that this is not getting more attention. If a proper-written Methods section is mandatory in a paper, why not the code that produced the results?

Among the excuses for not publishing the code (and the reasons why they are not valid most of the time) that the column identifies are:

  • It is not common practice.
  • People will pick holes and demand support and bug fixes.
  • The code is valuable intellectual property that belongs to my institution.
  • It is too much work to polish the code.

Even when the Methods section may be enough to be able to reproduce your results, why condemn someone else to go through the whole process of debugging some code to make it work? The code may not be pretty, may require some obscure software, or may be more convoluted than it has to be, but it is incredibly valuable and a time-saver for researchers. This kind of code is also very useful for students, it lets them learn how research is done in their area.

Of course, this requires some planning and careful file management, hopefully universities and societies will start promoting code publishing to make this practice more common.

Barnes, Nick. 2010. Publish your computer code: it is good enough. Nature 467: 753. doi:10.1038/467753a

Posted the data for the Acevedo and Villanueva 2006 paper

I have just posted online the data used for the paper: Acevedo, M. A. and L. J. Villanueva-Rivera. 2006. Using automated digital recording systems as effective tools for the monitoring of birds and amphibians. Wildlife Society Bulletin 34:211-214.

This data has been released under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License licence by the authors and can be used for educational and research purposes. I’ll appreciate if you let me know when and how you use this data.

Any commercial use is prohibited without the written authorization of the authors.

A Zen of Open Data

From the blog seeing data, Chris McDowall publishes a Zen of Open Data based on the Zen of Python (a programming language):

The Zen of Open Data

Open is better than closed.
Transparent is better than opaque.
Simple is better than complex.
Accessible is better than inaccessible.
Sharing is better than hoarding.
Linked is more useful than isolated.
Fine grained is preferable to aggregated.
(Although there are legitimate privacy and security limitations.)
Optimise for machine readability — they can translate for humans.
Barriers prevent worthwhile things from happening.
“Flawed, but out there” is a million times better than “perfect, but unattainable”.
Opening data up to thousands of eyes makes the data better.
Iterate in response to demand.
There is no one true feed for all eternity — people need to maintain this stuff.

A web-based sound archive management and visualization system

About two years ago we started a project in which we we collecting several hundred recordings each week at different sites. In an instant, browsing this archive became a problem due to the difficulty in browsing files that we need to listen and look at their spectrograms to make sense. Back then I started to work on a web-based system to manage and browse the archive. What was then is now an open-source and free software system available in version 1.0: Pumilio.

The system is written mostly in PHP, with some Python scripts and Javascript. PHP provides the main interaction and communication with the MySQL database. Python is used to analyze the structure of the sound files and generate spectrogram and waveform images. Javascript, in particular the JQuery framework, provide some checks, notices and interactivity. The system has two sound players, one is based on Prototype and Soundmanager2 and was made by Freesound.org. The other is the JW Player.

A bit further into the project, we needed a way to select regions in the spectrograms. I designed a way to do it from a web browser with a bit of Javascript code, using the JCrop plugin, in addition to the PHP code.

Future enhancements include options for ultrasonic and infrasonic recordings, more tools, and improved archive and metadata management tools. If you are interested in testing it there is a demo available or you can download the current version from SourceForge.