Mp3 parsing in Python

我与影子孤独终老i 提交于 2021-02-07 13:30:19

问题


This is something I have been trying to do for a while, and is more of an open ended question. If anyone has any knowledge that can help me shed some light on this, it would be very much appreciated.

I want to decode the audio stream in an mp3 and use that to drive animation, all using python. As I understand it, the audio data in an mp3 is stored in frames of 32 frequency subbands (or frequency bins), which is ideal for me - if I could take an mp3 and extract an amplitude for each subband on each frame, that would be perfect for what I want to do.

I found solution here https://bitbucket.org/portalfire/pymp3 where all the processing seems to be done in python. It's quite slow, but even if I could use that to extract what I want, it would be good - I'm struggling to understand what's going on in that code though. I also had a solution where I converted to wav and then used fft to extract frequencies from the wav. This was very noisy and seems like a stupid way to do it as the data I want is stored directly in the mp3 - converting back to a sound wave seems unnecessary. This was actually faster than the first one though. Here's what I ended up with:

http://www.youtube.com/watch?v=f_0FORxlK4A

Well if anyone has any advice, or experience they want to share, or ideas for libraries I should look at, I'd really like to hear.

Thanks!

Henry


回答1:


Take a look at:

http://lightshowpi.org/

Sniff up the source code and see how they did it.

They also used FFT on the wave output but in real time, and it is not so slow as you consider that it works fine on Raspberry Pi.

They might switch to cosine transform instead as it is faster, and this is what you would be doing if checking MP3 frames dirrectly, as MP3 is cosine transform encoded.

So, you will firstly have to know which bin resembles which frequencies in real world.

On pypi.python.org there are AV or ffmpeg direct bindings now that allow you to decode frame by frame, but I don't know whether you can extract freqs from objects representing frames or you would have to firstly convert to raw as well.

If I were you, I would use the pure Python MP3 code you found to extract just what I need, optimizing it in the process. Using cython if needed.

But that approach limits you to MP3 only. Lightshow Pi works on almost all compressed types.



来源:https://stackoverflow.com/questions/18495020/mp3-parsing-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!