How to extract the raw data from a mp3 file using python?

怎甘沉沦 提交于 2019-11-30 05:27:49

If I understand your question, you can try using pydub (a library I wrote) to get the audio data like so:

from pydub import AudioSegment

sound = AudioSegment.from_mp3("test.mp3")

# sound._data is a bytestring
raw_data = sound._data

There are a few similar questions floating around stackoverflow. There are distinct use cases.

  1. The user wants to convert .mp3 files to PCM files such as .wav files.

  2. The user wants to access the raw data in the .mp3 file (that is, not treat it as compressed PCM). Here the use case is one of understanding how compression schemes like MP3 and AAC work.

This answer is aimed at the second of these, though I do not have working code to share or point to.

Compression schemes such as MP3 generally work in the frequency domain. As a simplified example, you could take a .wav file 1024 samples at a time, transform each block of 1024 samples using an FFT, and store that. Roughly speaking, the lossy compression then throws away information from the frequency domain so as to allow for smaller encodings.

A pure python implementation is highly impractical if all you want to do is convert from .mp3 to .wav. But if you want to explore how .mp3 and related schemes work, having something which you can easily tinker with, even if the code runs 1000 times slower than what ffmpeg uses, can actually be useful, especially if written in a way which allows the reader of the source code to see how .mp3 compression works. For example see http://bugra.github.io/work/notes/2014-07-12/discre-fourier-cosine-transform-dft-dct-image-compression/ for an IPython workbook that walks through how frequency domain transforms are used in image compression schemes like JPEG. Something like that for MP3 compression and similar would be useful for people learning about compression.

An .mp3 file is basically a sequence of MP3 frames, each of which has a header and data component. The first task then is to write a Python class (or classes) to represent these, and read them from an .mp3 file. First read the file in binary mode (that is, f = open(filename,"rb") and then data = f.read() -- on a modern machine, given that a typical 5min song in .mp3 is about 5MB, you may as well just read the whole thing in in one go).

It may also be worth writing a simpler (and far less efficient) coding scheme along these lines to explore how it works, gradually adding the tricks schemes like MP3 and AAC use as you go. For example, split a PCM input file into 1024 sample blocks, use an FFT or DCT or something, and back again, and see how you get your original data back. Then explore how you can throw data away from the frequency transformed version, and see what effect it has when transformed back to PCM data. Then end result will be very poor, at first, but by seeing the problems, and seeing what e.g. MP3 and AAC do, you can learn why these compression schemes do things the way they do.

In short, if your use case is a 'getting stuff done' one, you probably don't want to use Python. If, on the other hand, your use case is a 'learning how stuff gets done' one, that is different. (As a rough rule of thumb, what you could do with optimised assembly on a Pentium 100 from the 90s, you can do at roughly the same performance using Python on a modern Core i5 -- something like that -- there is a factor of 100 or so in raw performance, and a similar slowdown from using Python).

Have you tried opening the file in read binary mode?

f = open("test.mp3", "rb")
first16bytes = f.read(16)
etc...
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!