New paper on the acoustic niche hypothesis and coquí frogs

Yesterday my new paper got published in PeerJ:

Villanueva-Rivera LJ. (2014Eleutherodactylus frogs show frequency but no temporal partitioning: implications for the acoustic niche hypothesisPeerJ 2:e496

Although a good part of it was a chapter of my MS thesis, I recently added a new analysis. To summarize, I tested the community of Eleutherodactylus frogs from Puerto Rico, locally known as coquíes, for temporal and acoustic frequency partitioning. The species exhibited frequency partitioning with a few exceptions, for example the more variable “quí” note of E. coqui and E. portoricensis. Most species called at the same time, with a peak during the first few hours of the night.

I make the argument that this is a good community in which to further study the acoustic niche hypothesis in anurans. Many studies of the hypothesis in anurans has been in communities with mixed groups, different families and different reproductive strategies. Since all but one of the mountain anurans in Puerto Rico are Eleuths, this closely-related group can be a good target since most observations will be a direct result of evolutionary history and habitat partitioning.

The paper is open access, published in PeerJ, and the data and tables used in the paper are archived at Figshare: 10.6084/m9.figshare.806302.

Open science, open data.


Posted in Bioacoustics, Data, Eleutherodactylus, Open Science, Papers, Puerto Rico, Science, Sensors, Tropical Ecology | 1 Comment

Small victories

My previous post on Wildlife Acoustics attracted the attention of the owner of the company, Ian Agranat. Although Agranat didn’t agree with my critique, at least it seems my post was enough to make some changes in WA:

  • They published the command-line code for the wac2wav converter
  • They published the definition of the metadata the SM3 adds to audio files

This is a move in the right direction.

Installing in Ubuntu

To install the wac2wav utility in Ubuntu, I have posted a script at Github. You only need gcc by installing the build-essential package.


Posted in Software | Leave a comment

Review of the SM2 and SM3 by Wildlife Acoustics

(Updated on 15 Jun to add the problem of uneven recording length)

I’ve been working with automated digital audio recorders for more than a decade. Back in 2003, I started recording at the tropical forests of Puerto Rico with a hacked MP3 player by Creative Labs. Back then, I had four units and each one was bulky and heavy due to their two lead-acid batteries. I learned a lot about what works and what doesn’t work in the short and long term in the field.

SM2 (left) and the new SM3Probably the most common automated recorders today are the units by Wildlife Acoustics (WA). The current models, SM2 and SM3, store the audio in SD cards and offer many useful features. However, there are a bunch of things that I find annoying and hopefully this post can start a conversation that will push WA to fix these issues.


First of all, are these units made for the field? In my experience, I can say they are but only for some temperate forests. The SM2 are housed in the same plastic box as the SM1, but with removable microphones. I have always had problems with these boxes, as they do not make your life easier in the field and do not tolerate harsh habitats like rainforests.

  • To open the cover, you have to open four screws – Any equipment that requires the use of tools in the field is a problem. What if the screwdriver in my multitool is not the right size? The screwdriver in a full size Leatherman works, but not the one in the Micra. Furthermore, working with tools in very cold temperatures is cumbersome.
  • The seal in the cover is very thin – This makes it easier for it to get damaged or dirty. Plus, they don’t sell a replacement.
  • The box has barely any attachment points – For a sensor that needs to be installed to something, the box of the SM2 has just 4 holes (behind the screws for the lid) to attach it to anything. I’ve had to drill holes and slots for anchors and straps in an effort to attach the units to a tree in the field.
  • There are no ways to secure the recorder – The boxes have no way to attach a lock or to pass a cable lock to protect the (expensive) unit from theft. Compare with almost all camera traps, which have ways to add locks or cable locks.
  • Old SM1 permanent micsThere are no field-replaceable parts – The SM1 (right) became useless when the microphones were damaged. With the SM2, they introduced microphones that could be changed in the field, but any repair beyond that is impossible without a soldering gun.
  • Some SM2 had useless battery cases – Seems that some batches of the SM2 had a battery case that was not meant to be used as it is in the SM2. The batteries would pop-out without warning because the case was very low quality. We lost many weeks of data because the batteries popped out a little after deployment. We didn’t get a replacement case and had to buy one, which makes no sense since these parts cost just a couple of dollars.
  • SwitchesThe SM2+ has dip switches to set the gain of the microphones – Why would you add dip switches to equipment to be used by biologists in the field? First, these switches are known only to engineers and are more complex than necessary. Second, their size makes it very hard to determine the position of each switch and to change them in the field.

Long-term investment

These units will spend day and night in very harsh conditions, therefore, they should be a way to give them a tune-up from time to time. This is made impossible, or very hard due to several factors:

  • SM3 back coverThe company does not sell individual parts – What if the housing is destroyed by a falling branch but the recorder inside is intact?
  • The SM2 is not built for repairs – Some parts, like line filters and humidity absorbing packs, are glued to the box. In addition, most connections are soldered, which are very hard to repair due to their small size.
  • The SM3 seems even worse for hacking and repairs. The back cover uses Torx screws instead of standard Philips (right). Again, most parts are soldered.

Some of the changes I made

SM2 in a Pelican boxTo be able to collect data at the tropical rainforest of La Selva, on the Caribbean lowlands of Costa Rica, I changed the housing of the SM2. I removed the motherboard and battery case from the green box and placed them in Pelican 1120 cases (right). These cases solved two problems:

  • Better seal – the 1120 has a larger O-ring that is easier to clean and provides a better protection. I also sealed the pressure-equalization valve to reduce the amount of humid air getting into the box.
  • Lock attachments – the 1120 has two holes for padlocks and I was able to use the ribs and handle to attach the units to trees using cable locks (right). This simplified mounting the units, in particular in forests with trees of very diverse diameters, and protected the units from theft.

I also came up with an easier way to attach windscreens:

  • Velcro to attach the windscreens – Instead of the annoying metal rings, I placed a strip of Velcro tape (the hook, or rough, side) around the neck of the microphone (see pic below). This held the windscreens without any problems, with the added advantage that they were easy to replace in the field.


WA tries to make some useful tools available, but the way these work is more cumbersome than necessary. My main problem is with the lack of support for the command line or scripts. Researchers usually need to work with many sensors or doing repetitive tasks. I can not understand why WA doesn’t write their software to allow us to write a script.

For example, the WAC to WAV converter is all graphical. If I pick up 8 recorders in a single day, I have to repeat the same process 8 times to convert them to WAV. In comparison, a single script that I launch with a single command from the command line:

  • compresses the files to FLAC
  • adds the files to the database
  • builds a tar file
  • stores the tar files in the archival system

Some time ago I asked in the forums for WA to publish a Linux command line tool, the response was that a tool existed for in-house use. Why would they deny us access to this?

As another example, it seems that the SM3 stores some metadata in the sound file itself. However, they don’t have any software that will extract this information. I would have to figure out the format and write some software to extract this metadata, making it very hard to obtain and rather useless.

The SM3 is worse

Even though WA seems to think the SM3 is better built for the field, it is completely wrong.

  • Metal case instead of plastic – The custom die-cast aluminum enclosure may protect the parts better, but at what cost? The SM2 weighs 0.7 kg but the SM3 went crazy with 2.5 kg – and this weight is without batteries! Why would you build it heavier when someone will carry it in a backpack while hiking in the field? Imagine having to Battery cases (2 batts each)hike several kilometers to install these units. How many, plus batteries, water, food, tools, and extras, can you carry?
  • More ports, more parts to keep clean – The SM3 has separate openings for the batteries (2 separate covers) and cards. This means there are three openings that we need to keep cleaning between deployments. It is good that they are trying to protect the electronics, but not at the cost of more complexity in the field! SD cards portWe are fighting time, heat or cold, mosquitoes, mud, rain, wind, critters, and more while working in the field – we don’t need more complications. Making things worse, the SD cover uses four small thumbscrews that won’t even let you use screwdrivers.
  • Microphones are permanent! – This is baffling. The main problem with the first generation SM1 was that the microphones were permanently glued to the case. If both microphones got damaged, very likely in the field, then you ended uPermanent mic, unscrewedp with a very expensive door stopper. Why did they remove the plug-in microphones from this version?
  • Still using SD cards – SD cards are slow, cumbersome, and fragile. Why not replace them with a USB port for a flash drive or an external SSD drive? Back when the SM1 came out, an SD card made sense in terms of cost and capacity, not anymore. Today, a 32GB USB flash drive costs around $20 and a 128 GB SSD external USB drive is around $150.
  • Flashy – The one good thing the green boxes had was that they were easy to hide. Now, the SM3 have these big gray boxes with a digital display on the front telling any passerby “come and touch me!”. Take a look at the boxes at utility poles, none have any indicators or displays on the outside. At the most, they have a light to indicate they are powered. Anyone that finds a box with a display and buttons will touch it, it is human nature.


Other than the lack of command-line access to their software and how hard it is to repair the units, the most troubling thing is the cost of these units. Since they came out, the price has stayed the same, around $700. All electronic equipment drops in price every year, yet the SM2 remains at the same price. This makes it very hard for researchers with limited funds to work.

In addition, the few parts they sell are way overpriced. I know companies make a good chunk of money on accessories (hint: a $4 HDMI cable is as good as a $90 one), but the microphones are a particular problem. When the SM1 units started dying because some animal or the elements destroyed them, I started to look around for options. I found a mic capsule with the same properties as the original, they even looked the same, for less than $2 each. The only difference was that the response was limited to 16kHz, while the SM1 was listed as having a flat response up to 20kHz. Since we were working at frequencies less than 10kHz, I went ahead and replaced the microphones with the ones I found, for less than $2 each. WA sells each SM2 microphone for $70. The weatherproof plug costs about $5, if we add the other pars and work, it may cost the company $15-20. This means that the most fragile and exposed part, therefore in need of frequent replacement, is sold at a 350-450% markup.

The windscreens for the SMX-II mics are another good example. WA sells the windscreens for $15 a pair while you can get one from B&H for $4, a 200% markup of retail price. Like I said above, adding a strip of Velcro (right) makes it easier to replace in the field than trying to make the metal ring fit (which requires tools).

Recording length

One last point that I forgot in my original post: the SD cards seem to be causing the recorder to have a very long delays when waking up. This ends up in the unit recording less time than what was programmed. It seems that every time the recorder wakes up, it reads the contents of the cards. The problem is that SD cards are slow and when they are getting full, this reading takes way too long.

For example, I did a quick plot of 237 deployments of SM1 recorders at Tippecanoe, where each file was programmed for 15 minutes. Each deployment lasted between one and one-and-a-half weeks (when the cards became full). The plot is the duration according to when they were recorded in each deployment.

As more and more files were recorded, the files got smaller. I didn’t investigate the source of the uptick after 190 files recorded, maybe a firmware update increased the wake-up length. However, the problem was still there.

As another example, I used the duration of 75,291 files recorded at La Selva using SM2 recorders. Each file was programmed to be 5 minutes long:

This problem can be avoided by moving away from SD cards or by setting the waking up period dynamically.

The users should record more time than what they want to avoid this problem. Then, the files can be cut to the desired length.


Posted in Bioacoustics, Science, Sensors, Software, Soundscape | 6 Comments

The chytrid fungus is not a threat to the Puerto Rican coquí (Eleutherodactylus coqui)

The fungus Batrachochytrium dendrobatidis (Bd) has been blamed for many declines in amphibian populations. One of these cases was the Eleutherodactylus frogs from Puerto Rico. However, a new paper shows one of the major problems with the current research on the fungus: lethality is assumed even when the data shows otherwise.

The paper by Langhammer et al. (A Fungal Pathogen of Amphibians, Batrachochytrium dendrobatidis, Attenuates in Pathogenicity with In Vitro Passages. PLoS ONE. 10.1371/journal.pone.0077630) worked on the question of the pathogenicity of Bd once it has been frozen in the lab. This is a worthwhile question since it is easier to work with lab samples, that could have lost some pathogenicity in the lab, than from fresh field samples.

The study found that for E. coqui:

[…] there was no significant difference in mortality between the three treatment groups, including the controls (Log-rank test, p=0.37). Only 1 frog died during the experiment, from the JEL427-P39 group. The frog may have succumbed to chytridiomycosis given its relatively high pathogen load (12,816 zoospore genomic equivalents), but all other Bd-exposed frogs cleared infection within 80 days. (emphasis mine)

A single frog died. This is in no way evidence that would support saying that the cause may have been the infection. Maybe the animal was too old, after all they kept them for a year in the lab before the tests. This is my major annoyance, that no matter what the data shows, they insist that Bd can kill an E. coqui frog. What if it can kill one? At the most, this would call for a large-scale study to determine probability of lethality in the species. But from this data we can conclude that Bd, both fresh and lab-stored, is not lethal to E. coqui.

The data showed that E. coqui cleared the infection from both strains used in less than two months, while the other species, Atelopus zeteki, did not:

Figure 2. Prevalence of Bd infection in (a) Eleutherodactylus coqui and (b) Atelopus zeteki exposed to JEL427-P9 or JEL427-P39. doi:10.1371/journal.pone.0077630.g002

Figure 2. Prevalence of Bd infection in (a) Eleutherodactylus coqui and (b) Atelopus zeteki exposed to JEL427-P9 or JEL427-P39.

As a species that does die from Bd, A. zeteki individuals started dying less than a month and a half after infection:

Figure 4. Survival pattern for Atelopus zeteki frogs exposed to JEL427-P9 (n=30), JEL427-P39 (n=30), or a sham solution (n=10, control). doi:10.1371/journal.pone.0077630.g004

Figure 4. Survival pattern for Atelopus zeteki frogs exposed to JEL427-P9 (n=30), JEL427-P39 (n=30), or a sham solution (n=10, control).

The authors justify talking about lethality in E. coqui because they found in another paper two animals in the field that apparently died from the disease. Two animals from a species with thousands of individuals per hectare. This hardly seems like something to worry about. At of today, Bd is still listed as a threat in the Red List entry of the species. Several years ago I tried to make the case to change this, but was met with a wall of credulity to the claim and incredulity to what the data actually showed. I eventually gave up.

Perhaps it is time to start testing the assumption that Bd is lethal but that it can be treated as just a disease in some species. Then, we can start focusing on what determines this difference between species and discard it as a threat in the species that resist it.

Posted in Declining Amphibian Populations, Eleutherodactylus, Science | Leave a comment

Some questions to those that oppose open data

The recent announcement by PLoS on the requirement to post the data behind a paper has pushed a discussion on the issue of data, authorship and ownership. The opposed-to camp seem to focus on two issues: fear of being “scooped” and the additional effort of organizing a dataset. Some also focus on being owners of the data, even when they did not fund the research.

I have a few questions to those in the opposed-to camp:

    • Do you see someone that uses your data to work on an idea in which you have no expertise, experience, knowledge, or interest, but that pushes the field forward, as a “thief” (as some have done already)?
    • What about the problem of imperialistic science? In tropical ecology many researchers are from the US or Europe, but they collect data in the tropics and take it back to their labs, leaving the local researchers, students, managers, and citizens out of the loop. Forcing the publication of the data will help a bit to solve this problem.
    • Is there a proposed alternative to open data in public archives?
      • If we keep the status quo and have to contact the author to even see the data, what do you propose when the authors do not answer inquiries or just plain refuse?
      • I had a case where a paper reached some conclusions based on some patterns, but there was a factor that they did not consider. I could not get anything useful from the figures because the authors had aggregated the data. I never got an answer because I have criticized the authors in the past. What can be done if I think a paper missed an important factor, or may just be wrong, but the authors refuse to even let me see the data? Would your ego allow you to share data with those that have criticized you in the past? We all know some that would not.
    • Wouldn’t you benefit if someone finds an error in your paper before you keep working on that same dataset based on erroneous conclusions? (leaving egos aside, if possible)
    • It is standard in some fields to publish the data. What makes your field/dataset different?
    • Curators of natural history museums are not authors when a paper is based on their museum’s collection just because the data came from it. Why should your data be any different?
    • Imagine the insights we could get to if we could analyze the data from deceased researchers using new methods or to add to a dataset. How do we make this happen? Should we expect their family, who may not be privy to the details of the dataset or even have worked in science, to do the work of curating a dataset they know nothing about? Would a student or colleague do all the work needed for no reward?
    • Yes, any system will have freeloaders, but are they a nuisance or a huge problem that outweighs the benefits?

    I can understand some of the objections, but I think we can all benefit from open data. In summary, where do we go from here?

Posted in Open Science, Science | 2 Comments

Experimenting with preprints

The online journal PeerJ has introduced a new feature to biology that has a great potential: a preprint archive. This type of archive seems to be very popular in other fields, in particular arXiv for physics and math. Preprint archives serve two purposes: opens peer-review to a wide range of researchers and establishes precedent.

I have posted the last manuscript from my MS thesis as a preprint:

Eleutherodactylus frogs show frequency but no temporal partitioning: implications for the acoustic niche hypothesis

Feel free to provide feedback about this manuscript. I am interested on how making this process open differs from the typical closed process with journals. After a while I will submit it for formal peer-review and publication.

Posted in Bioacoustics, Papers, Puerto Rico, Science, Tropical Ecology | Leave a comment

Some comments on researchers that do not want to share data

Last week, PLoS published an updated data policy in which they are requiring that the data of each paper published must be available publicly. The specific wording was:

authors must make all data publicly available, without restriction, immediately upon publication of the article

Apparently the only change is that now it is required that the publication states where the data is available from, while before it was suggested. The post got a strong response and they have updated it to touch on some of the questions received. However, this is a great opportunity to ask ourselves why is there a strong resistance to share data.

Via Twitter, I stumbled upon this post in Neuropolarbear’s blog that listed several objections to the new policy. It is a good starting point for the discussion on the objections to data sharing that researchers usually have:

1. The policy implies major benefit of data sharing is new discoveries. Authorship on articles resulting is a frequently debated topic. Does PLoS have a policy on whether scoopers need to at least offer middle-position authorship to the people who collected the data?

I guess this will be a very debated issue, but in my opinion there should be no expectation of authorship. You are not a co-author if someone uses your paper in a review, why should you have this privilege with a bare dataset? Being an author entails both ownership of ideas and responsibility for what is published. I would not want either for someone else’s work, unless they want my opinion or expertise. People can already use your data in meta-analyses. Using the raw data is not different, just makes it easier to build a more robust meta-analysis.

2. Does PLoS propose any protections for authors who are worried someone will scoop them on reanalysis of their own data? How about a special vault where the data is posted publicly in one year?

Why would you re-analyze the data? PLoS is only requiring the data that relates to the conclusions presented in the paper, not the whole dataset collected as part of a project. Therefore, you have said most of what you wanted to say in the paper regarding that data, what reason there would be to re-analyze it? Furthermore, what is the chance that someone else will have the same ideas, hypotheses, analyses and conclusions you will from that dataset? If it is an obvious idea, then you should publish it in the same paper or simultaneously in another paper.

The problem of publishing it one year later is that it defeats the purpose. The people with most interest in the data will read the paper as soon as it is published but will have to wait for a year for unclear reasons.

3. PLoS argues that data sharing makes life easier for authors. I (along with Drugmonkey) think this is wrong; if that proves to be the case, and it becomes clear curation is a large burden, will PLoS rethink their policy?

Why would curation of your dataset be a burden? The dataset had to be organized and stored in some way for the authors to be able to analyze it, therefore most of the work has already been done. I work with terabytes of sound recordings and databases with millions of rows and the most burdensome task I have found is to write a metadata file, which is just good practice anyways. The dataset can be messy and noisy, everyone understands that this can happen and it is part of the process.

5. PLoS’ response to researchers worried about being scooped on follow-ups offers no succor. Does PLoS recommend that researchers who are nonetheless still concerned simply submit to another journal?

If all someone else needs to “scoop” your idea is some data, you will get scooped sooner or later. Publish it to claim your idea.

9. Should I recuse myself from reviewing a paper in which I cannot evaluate the raw data? I currently review lots of MRI papers but raw MRI data may as well be ancient Etruscan as far as I am concerned. From this policy, it would be scientific malpractice for me to even pretend to review an MRI paper under these guidelines.

I found this statement puzzling. This tweet also seems to touch on this idea:

Michael Waskom (‏@michaelwaskom): The most literal reading of the @PLOS guidelines means I'll be sharing k-space data in custom-format spiral .pfiles, so have fun with that

Michael Waskom (‏@michaelwaskom): The most literal reading of the @PLOS guidelines means I’ll be sharing k-space data in custom-format spiral .pfiles, so have fun with that

Any field in which you are an expert, you should be able to manage and analyze the data generated by other researchers. Maybe the paper uses a new method or type of data, but then you can use the paper to verify if the methods are clear enough for a competent researcher to be able to carry out the same analysis. The data should be stored in standard or widely-used formats unless there is a good reason to use some exotic format. If not, then you are basing your paper in unsubstantiated analyses or data and it should be suspect.


Erin C. McKiernan posted some ideas that this could reduce diversity in a journal like PLoS because small labs and in countries with little research funding might avoid them. While there can be a worry about being scooped, and McKiernan makes it clear there appears to be no data on this problem, we must consider something else. The internet has allowed small companies and ideas to explode by providing a more level playing field for small and huge companies. Large labs and senior researchers are already established, it is harder for the more junior researchers to get noticed. We use the web to try to promote ourselves. Why not use the data as another way to attract attention to the work we do? In my experience, the approaches for new interactions and discovery of data and papers has been numerous. I don’t believe scoops would outnumber the positive impacts of sharing the data.


Another blogger that has posted objections is Orac:

One issue that was brought up that probably isn’t a huge consideration is that some datasets are too large to share easily. […] other types of data lack such public databases.

I found this interesting because it points to a problem that should not be solved by PLoS, but by scientific societies. If the data you use requires a particular infrastructure, then it is the responsibility of the researchers in the field to build it. I am facing a similar problem with gigabytes and terabytes of audio data. So far, I’ve been able to use FigShare and DataDryad, but for the bulk of my dissertation data I will probably have to host it on my own server. But this illustrates that it is time for researchers in the field to admit there is a problem and find ways to solve it. “Because it is difficult” can not be an excuse not to share data.


Several people were linking to Drugmonkey’s arguments, it is unfortunate that they are mixed with a hatred against humanities and people that want to use other people’s data. Tweets have gone even more to the hateful side by calling researchers that want to analyze someone else’s data “leeches.” Drugmoney posted several objections to the new policy:

The first problem with this new policy is that it suggests that everyone should radically change the way they do science, at great cost of personnel time, to address the legitimate sins of the few.

While fraud-prevention is a good reason to force publishing raw data, this is not the only reason. I’ve read plenty of papers that would have been a lot more useful if I had had the chance to repeat the analysis and learn from it. In addition, young researchers and students usually have no funds to collect lots of data and can put their ideas out there for the benefit of everyone if they can use available data.

This Data Access policy requires much additional data curation which will take time. We all handle data in the way that has proved most effective for us in our operations.

As with the post by Neuropolarbear, this is puzzling. We all understand the problems of collecting data and that it might be in a messy format. A good metadata file will take care of all these problems. Either we use standard formats for the field (wav for bioacoustics; vector and raster formats for landscape ecology; csv files can store database tables; etc.), or we had to create a particular format. Either way, documenting the way the data was collected and organized will help the authors in the future.

Maybe the proprietary software we use differs and the smoothest way to manipulate data is different. We use different statistical and graphing programs. Software versions change.

A metadata file will take care of most of this problems. Anyways, these are objections to using someone else’s data, not about sharing it.

Some people’s datasets are so large as to challenge the capability of regular-old, desktop computer and storage hardware.

This is equally puzzling. If I want to re-analyze your data or use it in some other way, I should be competent enough to understand the system requirements for it. This is, again, an argument about the use of the data, not sharing, and it presumes that other researchers will not be able to figure out this kind of issues on their own.

This diversity in data handling results, inevitably, in attempts for data orthodoxy. So we burn a lot of time and effort fighting over that.

The paper must have used a standard method to analyze the data. Some datasets will require more work than others. There is no requirement for a specific way to format the data, just that is is made available. The only exception would be particular fields that have agreed on particular formats or ways to store or collect data (e.g. GenBank).

Drugmonkey’s post suffers from an additional problem: anti-humanist arguments that have nothing to do with data sharing. In particular:

The second incident has to do with accusations of self-plagiarism based on the sorts of default Methods statements or Introduction and/or Discussion points that get repeated. Look there are only so many ways to say “and thus we prove a new facet of how the PhysioWhimple nucleus controls Bunny Hopping”. […] This is why concepts of what is “plagiarism” in science cannot be aligned with concepts of plagiarism in a bit of humanities text.

This has nothing to do with science or humanities, but with the law. A published work is owned by someone and can not be used without attribution. It is a pain to have to find a new way to explain the same methods, but this can be fixed by proposing stardard methods for the field. No one explains what t-tests, anovas, or multiple regressions do. If a method is used so much, it should have a name and a standard reference that can be cited for details.

Are the standards for dropping subjects the same in every possible experiments. (answer: no) Who annotates the files so that any idiot humanities-major on the editorial staff of PLoS can understand that it is complete?

No editorial staff has ever asked me to explain a figure, table, or appendix. That is the work of the experts in the field: the editors and the reviewers. Why would this be different for data?


I have worked with researchers that do not want to share their data because, when they have done it in the past, it was “misused.” There is no reason why this should be a reason either. Your paper might be misunderstood and therefore cited in the wrong context or to substantiate an idea it can not. Why should data be any different? The authors are responsible for using a correct dataset, not you. If someone publishes a paper using my data to say something it can not, I can (and have done) contact the editor to request space to publish a response or request the paper to be retracted. Science must be open and not subjected to the same culture of industry, where money is the only thing that matters.

Another perspective we should consider is that publishing data will help train future scientists. We have all faced many problems when embarking in a new area just because we could not analyze a dataset and were unaware of the limitations or possibilities. If we share more data, students and researchers dipping their toes in a new area will have a better chance at success since they don’t have to wait months or years to collect data to see if their ideas pan out.

Perhaps this jealousy over data is not about data, but a larger problem: how research is rewarded. It is our responsibility to make sure the work of a researcher is not limited to an index, but we all have to be in the same boat for this to work. Research can not be quantified the same way as a factory worker because the nature of the work is completely different. Lets avoid treating research as a means to increase your citation index.


In closing, a new perspective paper in PLoS Biology discusses some of these issues and will be helpful in the further discussions about these problems:

Troubleshooting Public Data Archiving: Suggestions to Increase Participation (DOI: 10.1371/journal.pbio.1001779)

In particular, they say:

In our experience, however, individuals are most concerned about the loss of priority access following PDA, which could generate competition with others when conducting subsequent analyses.

Why is everyone so afraid of being scooped? Yes, it might happen, but I think the fear is exaggerated. We should work on ways to promote and provide incentives to data sharing to show that the benefits are way larger than any possible, imagined or real, damage from scooping may have.

Probably the biggest hurdle in data sharing is culture and lack of incentive. Young researchers can try to convince their senior co-authors of the benefits of sharing data by using someone else’s data to strengthen a point in the paper. Storing the data in public archives that provide DOI numbers allows it to be cited, which provides an incentive if these can be tracked as easily as paper citations.


Posted in Data, Open Science, Science | 15 Comments

Soundscapes featured in Science

The journal Science covered some aspects of soundscapes in the News Focus section of the 21 February 2014 issue.

 Eavesdropping on Ecosystems – Kelly Servick

Advances in cheap, tough automated recorders and powerful sound-analysis software are inspiring scientists to launch increasingly ambitious efforts that use sound to document and analyze ecosystems. A growing community of self-described soundscape ecologists are capturing thousands of hours of sound—from birdsong and insect choruses to rushing water, thunderclaps, and even the drone of cars and airplanes. Converting complex soundscapes into relatively simple numerical indices of biodiversity is proving difficult, and researchers are struggling to turn huge collections of digital recordings into something they can use. But if they’re successful, they’ll have a powerful and noninvasive way to describe ecosystems and measure how they’re changing. (read the full article – paywall)


In addition, the Science podcast also touched on the subject. The podcast is available for free:

Science Podcast – Kelly Servick discusses what can be learned about ecosystems by listening to soundscapes with David Malakoff.


Posted in Bioacoustics, Science, Soundscape | Leave a comment

Updated version of the soundecology package for R

A few weeks ago I published a new R package, soundecology. After spending the three days before submission fixing bugs, testing and trying to make sure the code fits the specifications, you realize you missed something. In this case I missed passing the text of the package through a spell checker! Well, the text of version 1.1 has been check with the spellchecker of LibreOffice. If you have version 1.0, please update as soon as possible.

Another problem in v 1.0 was that one of the functions, ndsi(), would take ages to analyze large (>5 minute) files. The problem was that a function it uses, pwelch(), uses a for loop, which is very slow in R. So, I edited the function to feed the data to pwelch() in the better way, using apply(), greatly decreasing the time it takes to run.

I added a new function to the package: measure_signals(). This function opens the spectrogram of a sound file and lets you select bounding boxes for signals. You call the function specifying, among other things, a file to write the results and a threshold to determine what a signal is. Then, the function takes the bounding boxes the user selects, by clicking on the four corners, calculates the area of the signal using the threshold from the maximum value and saves this to the results file. I hope to write a tutorial with plenty of examples in the next couple of weeks.

As usual, please feel free to suggest improvements, report bugs, or complaint about problems you are having with the package. You can submit these via email or at the website of the package on Github. If you find this package useful for a publication, please cite it and send me a copy of the paper.

Posted in Bioacoustics, R, Science, Software, Soundscape | Leave a comment

New R package of soundscape ecology indices

After months of development, my package of soundscape indices is available at CRAN! The long process included translating methods sections from papers to code, Matlab scripts to R, as well as learning how to code all the metadata that R packages need.

The package is called soundecology and it includes the following indices:

  • Acoustic Complexity Index (Pieretti, et al. 2011)
  • Bioacoustic Index (Boelman et al. 2007)
  • Normalized Difference Soundscape Index (REAL and Kasten et al. 2011)
  • Acoustic Diversity Index (Villanueva-Rivera et al. 2011)
  • Acoustic Eveness Index (using the Gini index) (Villanueva-Rivera et al. 2011)

It is available for Windows, Linux and OS X, although it could take a couple of days for all the mirrors to get a copy. Please check the main page of the package or the page at CRAN for more details, including an introduction vignette that shows how to use it.

As an interesting fact about R, Professor Brian Ripley reported that CRAN got to 5,000 packages hosted last week.

Posted in Bioacoustics, Science, Software, Soundscape | Leave a comment