Sound Viewer Tool – a fast Python script to get waveform and spectrogram from a sound

This script uses the numpy and audiolab modules to generate waveform and spectrogram png images from a wav file. The original script was from the Freesound Blog. The wav file needs to be 16 bit, mono and have a sampling rate of 44.1 kHz. The script takes about 2-3 seconds for a 60 seconds file (on a Core2 Duo 2GHz, 2GB RAM machine).

Update: The script is now a project on SourceForge: http://sourceforge.net/projects/soundviewer/

Save the script as svt.py (Sound Viewer Tool) and call it as:

svt.py [options] filename.wav
  1. The options the script takes are:
  2. <ul>
  3.  <li>-a = output waveform image (default input filename + _w.png)</li>
  4.  <li>-s = output spectrogram image (default input filename + _s.png)</li>
  5.  <li>-w = image width in pixels (default 500)</li>
  6.  <li>-h = image height in pixels (default 170)</li>
  7.  <li>-f = fft size, power of 2 (default 2048)</li>
  8.  <li>-m = maximum frequency to draw, in Hz (default 22050)</li>
  9. </ul>
  10. <pre lang="PYTHON">#!/usr/bin/env python
  11.  
  12. # svt.py -- converts wave files to wave file and spectrogram images
  13. # Script based on wav2png by Freesound
  14. # Copyright (C) 2008 MUSIC TECHNOLOGY GROUP (MTG)
  15. #                    UNIVERSITAT POMPEU FABRA
  16. #
  17. # This program is free software: you can redistribute it and/or modify
  18. # it under the terms of the GNU Affero General Public License as
  19. # published by the Free Software Foundation, either version 3 of the
  20. # License, or (at your option) any later version.
  21. #
  22. # This program is distributed in the hope that it will be useful,
  23. # but WITHOUT ANY WARRANTY; without even the implied warranty of
  24. # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  25. # GNU Affero General Public License for more details.
  26. #
  27. # You should have received a copy of the GNU Affero General Public License
  28. # along with this program.  If not, see .
  29. #
  30. # Authors:
  31. #   Bram de Jong
  32. #
  33. # From http://freesound.iua.upf.edu/blog/?p=10
  34. #
  35.  
  36. import optparse, math, sys
  37. import scikits.audiolab as audiolab
  38. import ImageFilter, ImageChops, Image, ImageDraw, ImageColor
  39. import numpy
  40.  
  41. class TestAudioFile(object):
  42.     """A class that mimics audiolab.sndfile but generates noise instead of reading
  43.    a wave file. Additionally it can be told to have a "broken" header and thus crashing
  44.    in the middle of the file. Also useful for testing ultra-short files of 20 samples."""
  45.     def __init__(self, num_frames, has_broken_header=False):
  46.         self.seekpoint = 0
  47.         self.num_frames = num_frames
  48.         self.has_broken_header = has_broken_header
  49.  
  50.     def seek(self, seekpoint):
  51.         self.seekpoint = seekpoint
  52.  
  53.     def get_nframes(self):
  54.         return self.num_frames
  55.  
  56.     def get_samplerate(self):
  57.         return 44100
  58.  
  59.     def get_channels(self):
  60.         return 1
  61.  
  62.     def read_frames(self, frames_to_read):
  63.         if self.has_broken_header and self.seekpoint + frames_to_read &gt; self.num_frames / 2:
  64.             raise IOError()
  65.  
  66.         num_frames_left = self.num_frames - self.seekpoint
  67.         will_read = num_frames_left if num_frames_left &lt; frames_to_read else frames_to_read
  68.         self.seekpoint += will_read
  69.         return numpy.random.random(will_read)*2 - 1
  70.  
  71. class AudioProcessor(object):
  72.     def __init__(self, audio_file, fft_size, window_function=numpy.ones):
  73.         self.fft_size = fft_size
  74.         self.window = window_function(self.fft_size)
  75.         self.audio_file = audio_file
  76.         self.frames = audio_file.get_nframes()
  77.         self.samplerate = audio_file.get_samplerate()
  78.         self.channels = audio_file.get_channels()
  79.         self.spectrum_range = None
  80.         self.lower = 100
  81.         self.higher = 22050
  82.         self.lower_log = math.log10(self.lower)
  83.         self.higher_log = math.log10(self.higher)
  84.         self.clip = lambda val, low, high: min(high, max(low, val))
  85.  
  86.     def read(self, start, size, resize_if_less=False):
  87.         """ read size samples starting at start, if resize_if_less is True and less than size
  88.        samples are read, resize the array to size and fill with zeros """
  89.  
  90.         # number of zeros to add to start and end of the buffer
  91.         add_to_start = 0
  92.         add_to_end = 0
  93.  
  94.         if start &lt; 0:
  95.             # the first FFT window starts centered around zero
  96.             if size + start &lt;= 0:                 return numpy.zeros(size) if resize_if_less else numpy.array([])             else:                 self.audio_file.seek(0)                   add_to_start = -start # remember: start is negative!                 to_read = size + start                   if to_read &gt; self.frames:
  97.                     add_to_end = to_read - self.frames
  98.                     to_read = self.frames
  99.         else:
  100.             self.audio_file.seek(start)
  101.  
  102.             to_read = size
  103.             if start + to_read &gt;= self.frames:
  104.                 to_read = self.frames - start
  105.                 add_to_end = size - to_read
  106.  
  107.         try:
  108.             samples = self.audio_file.read_frames(to_read)
  109.         except IOError:
  110.             # this can happen for wave files with broken headers...
  111.             return numpy.zeros(size) if resize_if_less else numpy.zeros(2)
  112.  
  113.         # convert to mono by selecting left channel only
  114.         if self.channels &gt; 1:
  115.             samples = samples[:,0]
  116.  
  117.         if resize_if_less and (add_to_start &gt; 0 or add_to_end &gt; 0):
  118.             if add_to_start &gt; 0:
  119.                 samples = numpy.concatenate((numpy.zeros(add_to_start), samples), axis=1)
  120.  
  121.             if add_to_end &gt; 0:
  122.                 samples = numpy.resize(samples, size)
  123.                 samples[size - add_to_end:] = 0
  124.  
  125.         return samples
  126.  
  127.     def spectral_centroid(self, seek_point, spec_range=120.0):
  128.         """ starting at seek_point read fft_size samples, and calculate the spectral centroid """
  129.  
  130.         samples = self.read(seek_point - self.fft_size/2, self.fft_size, True)
  131.  
  132.         samples *= self.window
  133.         fft = numpy.fft.fft(samples)
  134.         spectrum = numpy.abs(fft[:fft.shape[0] / 2 + 1]) / float(self.fft_size) # normalized abs(FFT) between 0 and 1
  135.         length = numpy.float64(spectrum.shape[0])
  136.  
  137.         # scale the db spectrum from [- spec_range db ... 0 db] &gt; [0..1]
  138.         db_spectrum = ((20*(numpy.log10(spectrum + 1e-30))).clip(-spec_range, 0.0) + spec_range)/spec_range
  139.  
  140.         energy = spectrum.sum()
  141.         spectral_centroid = 0
  142.  
  143.         if energy &gt; 1e-20:
  144.             # calculate the spectral centroid
  145.  
  146.             if self.spectrum_range == None:
  147.                 self.spectrum_range = numpy.arange(length)
  148.  
  149.             spectral_centroid = (spectrum * self.spectrum_range).sum() / (energy * (length - 1)) * self.samplerate * 0.5
  150.  
  151.             # clip &gt; log10 &gt; scale between 0 and 1
  152.             spectral_centroid = (math.log10(self.clip(spectral_centroid, self.lower, self.higher)) - self.lower_log) / (self.higher_log - self.lower_log)
  153.  
  154.         return (spectral_centroid, db_spectrum)
  155.  
  156.     def peaks(self, start_seek, end_seek):
  157.         """ read all samples between start_seek and end_seek, then find the minimum and maximum peak
  158.        in that range. Returns that pair in the order they were found. So if min was found first,
  159.        it returns (min, max) else the other way around. """
  160.  
  161.         # larger blocksizes are faster but take more mem...
  162.         # Aha, Watson, a clue, a tradeof!
  163.         block_size = 4096
  164.  
  165.         max_index = -1
  166.         max_value = -1
  167.         min_index = -1
  168.         min_value = 1
  169.  
  170.         if end_seek &gt; self.frames:
  171.             end_seek = self.frames
  172.  
  173.         if block_size &gt; end_seek - start_seek:
  174.             block_size = end_seek - start_seek
  175.  
  176.         if block_size &lt;= 1:             samples = self.read(start_seek, 1)             return samples[0], samples[0]         elif block_size == 2:             samples = self.read(start_seek, True)             return samples[0], samples[1]           for i in range(start_seek, end_seek, block_size):             samples = self.read(i, block_size)               local_max_index = numpy.argmax(samples)             local_max_value = samples[local_max_index]               if local_max_value &gt; max_value:
  177.                 max_value = local_max_value
  178.                 max_index = local_max_index
  179.  
  180.             local_min_index = numpy.argmin(samples)
  181.             local_min_value = samples[local_min_index]
  182.  
  183.             if local_min_value &lt; min_value:
  184.                 min_value = local_min_value
  185.                 min_index = local_min_index
  186.  
  187.         return (min_value, max_value) if min_index &lt; max_index else (max_value, min_value)     def interpolate_colors(colors, flat=False, num_colors=256):     """ given a list of colors, create a larger list of colors interpolating     the first one. If flatten is True a list of numers will be returned. If     False, a list of (r,g,b) tuples. num_colors is the number of colors wanted     in the final list """       palette = []       for i in range(num_colors):         index = (i * (len(colors) - 1))/(num_colors - 1.0)         index_int = int(index)         alpha = index - float(index_int)           if alpha &gt; 0:
  188.             r = (1.0 - alpha) * colors[index_int][0] + alpha * colors[index_int + 1][0]
  189.             g = (1.0 - alpha) * colors[index_int][1] + alpha * colors[index_int + 1][1]
  190.             b = (1.0 - alpha) * colors[index_int][2] + alpha * colors[index_int + 1][2]
  191.         else:
  192.             r = (1.0 - alpha) * colors[index_int][0]
  193.             g = (1.0 - alpha) * colors[index_int][1]
  194.             b = (1.0 - alpha) * colors[index_int][2]
  195.  
  196.         if flat:
  197.             palette.extend((int(r), int(g), int(b)))
  198.         else:
  199.             palette.append((int(r), int(g), int(b)))
  200.  
  201.     return palette
  202.  
  203. class WaveformImage(object):
  204.     def __init__(self, image_width, image_height):
  205.         self.image = Image.new("RGB", (image_width, image_height))
  206.  
  207.         self.image_width = image_width
  208.         self.image_height = image_height
  209.  
  210.         self.draw = ImageDraw.Draw(self.image)
  211.         self.previous_x, self.previous_y = None, None
  212.  
  213.         colors = [
  214.                     (50,0,200),
  215.                     (0,220,80),
  216.                     (255,224,0),
  217.                  ]
  218.  
  219.         # this line gets the old "screaming" colors back...
  220.         # colors = [self.color_from_value(value/29.0) for value in range(0,30)]
  221.  
  222.         self.color_lookup = interpolate_colors(colors)
  223.         self.pix = self.image.load()
  224.  
  225.     def color_from_value(self, value):
  226.         """ given a value between 0 and 1, return an (r,g,b) tuple """
  227.  
  228.         return ImageColor.getrgb("hsl(%d,%d%%,%d%%)" % (int( (1.0 - value) * 360 ), 80, 50))
  229.  
  230.     def draw_peaks(self, x, peaks, spectral_centroid):
  231.         """ draw 2 peaks at x using the spectral_centroid for color """
  232.  
  233.         y1 = self.image_height * 0.5 - peaks[0] * (self.image_height - 4) * 0.5
  234.         y2 = self.image_height * 0.5 - peaks[1] * (self.image_height - 4) * 0.5
  235.  
  236.         line_color = self.color_lookup[int(spectral_centroid*255.0)]
  237.  
  238.         if self.previous_y != None:
  239.             self.draw.line([self.previous_x, self.previous_y, x, y1, x, y2], line_color)
  240.         else:
  241.             self.draw.line([x, y1, x, y2], line_color)
  242.  
  243.         self.previous_x, self.previous_y = x, y2
  244.  
  245.         self.draw_anti_aliased_pixels(x, y1, y2, line_color)
  246.  
  247.     def draw_anti_aliased_pixels(self, x, y1, y2, color):
  248.         """ vertical anti-aliasing at y1 and y2 """
  249.  
  250.         y_max = max(y1, y2)
  251.         y_max_int = int(y_max)
  252.         alpha = y_max - y_max_int
  253.  
  254.         if alpha &gt; 0.0 and alpha &lt; 1.0 and y_max_int + 1 &lt; self.image_height:             current_pix = self.pix[x, y_max_int + 1]               r = int((1-alpha)*current_pix[0] + alpha*color[0])             g = int((1-alpha)*current_pix[1] + alpha*color[1])             b = int((1-alpha)*current_pix[2] + alpha*color[2])               self.pix[x, y_max_int + 1] = (r,g,b)           y_min = min(y1, y2)         y_min_int = int(y_min)         alpha = 1.0 - (y_min - y_min_int)           if alpha &gt; 0.0 and alpha &lt; 1.0 and y_min_int - 1 &gt;= 0:
  255.             current_pix = self.pix[x, y_min_int - 1]
  256.  
  257.             r = int((1-alpha)*current_pix[0] + alpha*color[0])
  258.             g = int((1-alpha)*current_pix[1] + alpha*color[1])
  259.             b = int((1-alpha)*current_pix[2] + alpha*color[2])
  260.  
  261.             self.pix[x, y_min_int - 1] = (r,g,b)
  262.  
  263.     def save(self, filename):
  264.         # draw a zero "zero" line
  265.         a = 25
  266.         for x in range(self.image_width):
  267.             self.pix[x, self.image_height/2] = tuple(map(lambda p: p+a, self.pix[x, self.image_height/2]))
  268.  
  269.         self.image.save(filename)
  270.  
  271. class SpectrogramImage(object):
  272.     def __init__(self, image_width, image_height, fft_size, f_max):
  273.         self.image = Image.new("P", (image_height, image_width))
  274.  
  275.         self.image_width = image_width
  276.         self.image_height = image_height
  277.         self.fft_size = fft_size
  278.         self.f_max = f_max
  279.  
  280.         colors = [
  281.                     (0, 0, 0),
  282.                     (58/4,68/4,65/4),
  283.                     (80/2,100/2,153/2),
  284.                     (90,180,100),
  285.                     (224,224,44),
  286.                     (255,60,30),
  287.                     (255,255,255)
  288.                  ]
  289.  
  290.         self.image.putpalette(interpolate_colors(colors, True))
  291.  
  292.         # generate the lookup which translates y-coordinate to fft-bin
  293.         self.y_to_bin = []
  294.         f_min = 10.0
  295. #        f_max = 22050.0
  296.         y_min = math.log10(f_min)
  297.         y_max = math.log10(f_max)
  298.         for y in range(self.image_height):
  299. #            freq = math.pow(10.0, y_min + y / (image_height - 1.0) *(y_max - y_min))
  300.             freq = f_min + y / (image_height - 1.0) *(f_max - f_min)
  301.             bin = freq / 22050.0 * (self.fft_size/2 + 1)
  302.             bin = freq / 22050.0 * (self.fft_size/2 + 1)
  303.  
  304.             if bin &lt; self.fft_size/2:                 alpha = bin - int(bin)                   self.y_to_bin.append((int(bin), alpha * 255))           # this is a bit strange, but using image.load()[x,y] = ... is         # a lot slower than using image.putadata and then rotating the image         # so we store all the pixels in an array and then create the image when saving         self.pixels = []       def draw_spectrum(self, x, spectrum):         for (index, alpha) in self.y_to_bin:             self.pixels.append( int( ((255.0-alpha) * spectrum[index] + alpha * spectrum[index + 1] )) )           for y in range(len(self.y_to_bin), self.image_height):             self.pixels.append(0)       def save(self, filename):         self.image.putdata(self.pixels)         self.image.transpose(Image.ROTATE_90).save(filename)     def create_png(input_filename, output_filename_w, output_filename_s, image_width, image_height, fft_size, f_max):     print "processing file %s:\n\t" % input_file,       audio_file = audiolab.sndfile(input_filename, 'read')       samples_per_pixel = audio_file.get_nframes() / float(image_width)     processor = AudioProcessor(audio_file, fft_size, numpy.hanning)       waveform = WaveformImage(image_width, image_height)     spectrogram = SpectrogramImage(image_width, image_height, fft_size, f_max)       for x in range(image_width):           if x % (image_width/10) == 0:             sys.stdout.write('.')             sys.stdout.flush()           seek_point = int(x * samples_per_pixel)         next_seek_point = int((x + 1) * samples_per_pixel)           (spectral_centroid, db_spectrum) = processor.spectral_centroid(seek_point)         peaks = processor.peaks(seek_point, next_seek_point)           waveform.draw_peaks(x, peaks, spectral_centroid)         spectrogram.draw_spectrum(x, db_spectrum)       waveform.save(output_filename_w)     spectrogram.save(output_filename_s)       print " done"     if __name__ == '__main__':     parser = optparse.OptionParser("usage: %prog [options] input-filename", conflict_handler="resolve")     parser.add_option("-a", "--waveout", action="store", dest="output_filename_w", type="string", help="output waveform image (default input filename + _w.png)")     parser.add_option("-s", "--specout", action="store", dest="output_filename_s", type="string", help="output spectrogram image (default input filename + _s.png)")     parser.add_option("-w", "--width", action="store", dest="image_width", type="int", help="image width in pixels (default %default)")     parser.add_option("-h", "--height", action="store", dest="image_height", type="int", help="image height in pixels (default %default)")     parser.add_option("-f", "--fft", action="store", dest="fft_size", type="int", help="fft size, power of 2 for increased performance (default %default)")     parser.add_option("-m", "--fmax", action="store", dest="f_max", type="int", help="Maximum freq to draw, in Hz (default %default)")     parser.add_option("-p", "--profile", action="store_true", dest="profile", help="run profiler and output profiling information")       parser.set_defaults(output_filename_w=None, output_filename_s=None, image_width=500, image_height=170, fft_size=2048, f_max=22050)       (options, args) = parser.parse_args()       if len(args) == 0:         parser.print_help()         parser.error("not enough arguments")       if len(args) &gt; 1 and (options.output_filename_w != None or options.output_filename_s != None):
  305.         parser.error("when processing multiple files you can't define the output filename!")
  306.  
  307.     # process all files so the user can use wildcards like *.wav
  308.     for input_file in args:
  309.  
  310.         output_file_w = options.output_filename_w or input_file + "_w.png"
  311.         output_file_s = options.output_filename_s or input_file + "_s.png"
  312.  
  313.         args = (input_file, output_file_w, output_file_s, options.image_width, options.image_height, options.fft_size, options.f_max)
  314.  
  315.         if not options.profile:
  316.             create_png(*args)
  317.         else:
  318.             import hotshot
  319.             from hotshot import stats
  320.             prof = hotshot.Profile("stats")
  321.             prof.runcall(create_png, *args)
  322.             prof.close()
  323.  
  324.             print "\n---------- profiling information ----------\n"
  325.             s = stats.load("stats")
  326.             s.strip_dirs()
  327.             s.sort_stats("time")
  328.             s.print_stats(30)
  • Carlo DiCelico

    This is a beautiful bit of code. I have two questions, though. Say I wanted to allow the user to input a color or colors for both the spectrograph and waveform, what would be the best way to modify the script to accomplish that? Also, could this be altered to output an SVG instead of a PNG or JPG?

    Well done and thanks for your awesome work.

  • ljvillanueva

    To change the colors, just change the RGB values in the lists for each corresponding image. For the waveform is in the WaveformImage class and for the spectrogram in the SpectrogramImage class. This script uses PIL, so it can only export an image in a format that is supported by it, so only rasters. I guess you would need to add a library that can export to SVG or convert the image.