aubio
Playing with Shazam fingerprints
It's already 6 years back that Avery Wang published his paper about his Shazam fingerprinting algorithm: An Industrial-Strength Audio Search Algorithm", in Proc. 2003 ISMIR International Symposium on Music Information Retrieval, Baltimore, MD, Oct. 2003.
A few weeks ago, I took some time to read the paper and implement the algorithm. It took less than 100 lines of python, and the results are quite impressive. Unfortunately, after asking Avery Wang, I did not get the permission from the patent holders to publish these 100 lines of code. Sadness! Still, many thanks to Avery for his help with getting in touch with them.
Although I can't publish this implementation, I prepared a few plots to show were it got me. Below is an example of the landmarks extracted from a 20 seconds sound sample.
And some examples of search results, with similar plots as the ones in Avery's paper: scatter plot of matching landmarks (top) and histogram of time differences (bottom). The first image shows two signal that do not match, the second one shows a match.
On my laptop, it takes a little less than 8 seconds per minute of music to extract the fingerprints from an MP3 file. Not too bad for a python script, and quite promising for the newly rewritten python aubio module. The search algorithm I wrote works in O(N), with N the total number of fingerprints in the database. There is much space for improvements. For instance storing the fingerprints in a dictionnary instead of a flat list would probably speed it up to O(log(N)).
To test the robustness of my simple implementation, I tried adding noise, echo, and mixing with a high noise level, but the algorithm remains quite robust to this type of signal deterioration. Mixing two music signal does become problematic though.
While writing this code, I didn't know about Dan Ellis' implementation for Matlab. I also found more articles since, How Shazam Works and Shazam: not magic after all, as well as in Slate Magazine. In the paper Fingerprinting to Identify Repeated Sound Events in Long-Duration Personal Audio Recordings, in Proc. ICASSP-07 Hawaiʻi, pp.I-233-236 (2007), James Ogle and Dan Ellis use the algorithm to look for structural similarity within a recording (see also the poster). A very nice idea. Looking at the other patents submitted by Avery Wang, he also thought about such applications.
Looking for two similar songs was one of the thing I tried to test my implementation. The above plot, on the left, represents the number of matching fingerprints against the position in the query and was obtained using two versions of the same song: the famous disco version of What a Diff'rence a Day Makes, by Esther Phillips, published in 1975 on the excellent Kudu/CTI label. The first version, the second track on the eponymous album, is 4:31 minutes long. The second version, track number 9, is 3:11 minutes long.
By now you should wonder: why two large peaks and not just one? The color map, on the right, shows the 2D histogram of the fingerprints matching exactly in the song. The difference is now clear: in the short version, the second minute (a saxophone solo) has been cut out of the long version. Other patterns seem to appear outside of the diagonal, but not clearly enough to be interpreted ‒ the search algorithm would need to be modified to allow fuzzy matching, in a similar way as in Ogle and Ellis paper.
Avery's article mentioned how he could pick up lip syncing. There are many more applications to be considered: comparing two live performances, extracting automatically the playlist from a movie or a DJ performance, analysing the structure of a song, scrobbling vinyl records...
Unfortunately, it seems like I will have to scrobble my vinyl records alone for now, and wait patiently for these patents to expire to do that openly.
Wed, 11 Nov 2009, 23:39. trackback - view/add comments
Responses to this post
@ken: As I mentioned in the post, I told Avery I would not distribute the code. The algorithm is fairly simple though, and I'd be happy to answer your questions if you miss a few steps.
Posted by piem at Thu Dec 3 16:08:03 2009
Is it the patent exactly? http://www.boliven.com/patent/US7516074 We will be a little older :)
Posted by Marko at Thu Dec 17 11:37:51 2009
very bad this patent thing. And what about applying the idea to different fields (i.e. other time series, not music?) and/or slightly modifying the algorithm? how far does the patent protection go?
Posted by Roberto at Thu Jan 7 11:56:39 2010
I have never seen a toy implementation of a patented algorithm being actively refused distribution before.
To be honest I wouldn't have even thought to bother to ask Avery and just released it. Jeez I have knocked up plenty of toy implementations of algorithms over the years and never check to see if they are patented.
It is quite a joke.
Posted by Ikkath at Mon Jun 14 04:42:25 2010
Would you share your python code with me? nacho4d@gmail.com I am interested in a sound project. Thanks
Posted by Ignacio at Mon Jul 12 02:48:13 2010