If you are looking for a solution in .NET check SoundFingerprinting library.
It's open source and built on top of Content Fingerprinting Using Wavelets research paper.
The algorithm is different from Shazaam's, but the general idea is similar: extract most prominent coefficients from the spectrum, then use them to build the fingerprints for later retrieval.
Description of the algorithm can be found here.