Category Archives: Software

New paper on our software to manage sound archives

The paper describing our software Pumilio has just been published in the Bulletin of the Ecological Society of America. Pumilio is a web-based sound archive and analysis tool.

Pumilio was created out of necessity. Our lab was collecting a lot of sound data and there was no system that could help us manage that amount of data. In addition, we used at least two operating systems (Windows and Linux) and some collaborators even use Mac. On top of that, some of us used Chrome, while some used Firefox. We started just putting files in folders in a network share. After a few hundred files there is no way of keeping track. Plus, we were wasting time each time we had to open a file in Audacity or Raven to see its spectrogram.

One of the first instances of this system was a simple database that would display rows of spectrograms with a Flash mp3 player on the bottom of each. Similar to the “gallery” view of the current version of Pumilio. The problem was generating all those spectrograms. Using R was easy, but took too long to write the png files. The function specgram() in Python crashed with our files (15 minutes). After a while, I stumbled upon a Python script written by the people of Freesound.org. This was a very fast script and I took it and implemented it.

Afterwards it was all step by step. A JavaScript plugin built to crop images over the web became a selection tool for zooming in a sound and filtering.

The main idea is to make it easy to navigate a sound archive using any modern computer. This means using cross-browser tools to allow the use of any modern browser. Blueprint enables a consistent CSS, JQuery takes care of most of the JavaScript and some of the styling.

Screenshots of Pumilio:

Main Menu

Browsing the archive

All the data of a sound file

The software is available for free under an open source license from the project website.

Villanueva-Rivera, Luis J. and Bryan C. Pijanowski. 2012. Pumilio: A Web-Based Management System for Ecological Recordings. Bulletin of the Ecological Society of America 93:71–81. doi:10.1890/0012-9623-93.1.71PDF. Full textPumilio Website.

Publish your computer code: it is good enough

I just found this column in Nature discussing the need for scientists to publish the code they used. I’m still amazed that this is not getting more attention. If a proper-written Methods section is mandatory in a paper, why not the code that produced the results?

Among the excuses for not publishing the code (and the reasons why they are not valid most of the time) that the column identifies are:

  • It is not common practice.
  • People will pick holes and demand support and bug fixes.
  • The code is valuable intellectual property that belongs to my institution.
  • It is too much work to polish the code.

Even when the Methods section may be enough to be able to reproduce your results, why condemn someone else to go through the whole process of debugging some code to make it work? The code may not be pretty, may require some obscure software, or may be more convoluted than it has to be, but it is incredibly valuable and a time-saver for researchers. This kind of code is also very useful for students, it lets them learn how research is done in their area.

Of course, this requires some planning and careful file management, hopefully universities and societies will start promoting code publishing to make this practice more common.

Barnes, Nick. 2010. Publish your computer code: it is good enough. Nature 467: 753. doi:10.1038/467753a

“Data, data everywhere”

I just found a special report from The Economist on data, “Data, data everywhere.” The report deals, in several articles, on the new trend of massive amounts of data available today. They cover mostly the business implications, but also scientific data. For example, the Large Hadron Collider:

[G]enerate 40 terabytes every second—orders of magnitude more than can be stored or analysed. So scientists collect what they can and let the rest dissipate into the ether.

Another quote that got my attention was:

Only 5% of the information that is created is “structured”, meaning it comes in a standard format of words or numbers that can be read by computers.

This means that very little of the data available can be easily imported to other computer systems for analysis. It will become very important to make data available in a way that other computers can use it, otherwise most of the time and cost will go in re-formatting data. It will be kinda like when transferring data from paper to a computer, one more time. We should make raw data available, but also raw data in structured form. A PDF is great for humans, but it sucks when trying to extract data from it. At least something like a comma-separated file should help this process a lot.

Another evident consequence is that scientists, and most notably the next generation, will need to know how to work with large amounts of data. Programming and databases will have to become part of the scientists education, so you better start sooner than later.

My technology tools

Talking the other day with a labmate I suggested him some tools I’m using. I thought it could help someone else to make a list of some of the technological tools I use everyday. Some may help you, some may not be for you, but the main lesson here is this: technology is supposed to help us, not hinder. If you feel trapped by a particular gadget or software, look for alternatives.

  • Operating system (Ubuntu Linux) – I won’t go into the Mac vs PC debate here. I’m using Ubuntu Linux 95% of the time. The other 5% is for games and software that can not be run anywhere else, like ArcGIS. Each release of Ubuntu is easier to use. In particular, version 9.10 includes a “Software Center” that allows you to search easily for programs or descriptions. You can download a “live CD,” which is a CD-ROM image you can burn to a CD-R and then boot directly from the CD; you will be using Ubuntu without making any change to your computer. Worth a try.
  • Data in the cloud (Dropbox) – One of the most impressive and simple working applications to store files in the cloud is Dropbox. It automatically keeps file synchronized and keeps a history of the file changes. I am using their free account (2GB) to store files that I am working on or might need for reference on working projects. Note: I still have some referrals for an extra 250MB for free, just use this link.
  • Note keeping (Tiddlywiki) – A few months ago I began a search for a note-taking application to replace a notebook. The reason was simple: convenience. I tried using MediaWiki, but it was too much overhead for just me and it required a live Internet connection. Now I’m using Tiddlywiki, which works like a local wiki that is completely contained in a single file. The file is loaded locally, so no network connection is needed. It uses a combination of CSS and JavaScript to allow to use tags, links, and search of your notes. I keep the file in the Dropbox folder.
  • Task tracking (MonkeyGTD) – A system based on TiddlyWiki that lets you manage tasks based on the Getting Things Done program. You can find more info here.
  • Office applications (OpenOffice) – Most users do not need all the extras that MS Office has, OpenOffice opens and edits all MS Office formats and the best thing is that is free and open source. Some features may not be present, but for most students it will be enough.
  • Data analysis and statistics (R) – One of the most important open-source scientific tools is R, which is based on S-Plus. It was even featured in a New York Times article. Check the Comprehensive R Archive Network for packages, tutorials, and many publications.
  • Password management (KeePassX) -Too many passwords, not much memory to keep them. KeePassX is a password manager that will allow you to keep many passwords and other information encrypted with a single password. I like it because it also has a password generator with many options.
  • Document encryption (Truecrypt) – To protect my documents in my laptop in case it is stolen, I have all my documents inside an encrypted partition using Truecrypt. It is a free and open-source encryption software that only uses strong methods. It is a must-have to reduce the risk of identity theft.
  • Update: Literature management (Aigaion) – In this digital age, PDFs are a convenience but it is still complicated to keep them organized. I’ve been using Aigaion for a while, it runs on PHP/MySQL and can import literature in several formats as well as hold the PDF files for each paper.

I hope this list helps someone else.