Access MP3 music data using Python

て烟熏妆下的殇ゞ 提交于 2019-12-05 10:20:13

Try using id3-py or mutagen to strip out all the tags (both ID3v1 and ID3v2, they can both be on the same file), then computing the MD5 on the result.

Assuming iTunes didn't manipulate the file beyond tags they should be identical. Transcoding obviously would make this approach invalid.

Use some fingerprint algorithm. You might know about MusicBrainz. They have listed here some fingerprint algorithms. They use AcoustId now which is probably the thing you should also use (it's good and it's free). There is the Chromaprint library which can generate such a fingerprint.

I wrote a Python module ffmpeg which does the decoding via FFmpeg and provides a simple function to calculate the AcoustId fingerprint (using Chromaprint). Here is a small demo for that (which even queries MusicBrainz for the song).

It should be easy to build some tool using that to find all duplicates.

The fingerprint will be exactly the same if the audio data is exactly the same. It will be similar if the audio data is similar. See the AcoustId homepage for further information how you calculate the similarity if you don't just want to check for equality.

Paul Sasik

That's actually pretty advanced, fuzzy logic-type stuff you're asking about.

This isn't an answer but take a look at the discussion in this article: Detect duplicate MP3 files with different bitrates and/or different ID3 tags? (It might qualify as a dupe actually... It's even Python-specific.)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!