First, you will have to change your domain of comparison. Analyzing raw samples from the uncompressed files will get you nowhere. Your distance measure will be based on one or more features that you extract from the audio samples. Wikipedia lists the following features as commonly used for Acoustic Fingerprinting:
Perceptual characteristics often exploited by audio fingerprints include average zero crossing rate, estimated tempo, average spectrum, spectral flatness, prominent tones across a set of bands, and bandwidth.
I don't have programmatic solutions for you but here's an interesting attempt at reverse engineering the YouTube Audio ID system. It is used for copyright infringement detection, a similar problem.