How to get Bass, Mid, Treble data from FFT

问题

I'm new to this whole audio processing area and I'm wondering how to extract Bass, Mid and treble from an FFT output. I'm currently using this to get the data: https://stackoverflow.com/a/20414331/2714577 which uses Naudio.

But I'm using a fftlength of 1024 (require speed). I'm trying to get these 3 sections in a format such as 0-255 for colour purposes.

I currently have this:

    double[] data = new double[512];

    void FftCalculated(object sender, FftEventArgs e)
    {

        for (int j = 0; j < e.Result.Length / 2; j++)
        {
            double magnitude = Math.Sqrt(e.Result[j].X * e.Result[j].X + e.Result[j].Y * e.Result[j].Y);
            double dbValue = 20 * Math.Log10(magnitude);

            data[j] = dbValue;
        }

        double d = 0;

        for (int i = 20; i < 89; i++)
        {

            d += data[i];
        }

        double m = 0;

        for (int i = 150; i < 255; i++)
        {

            m += data[i];
        }

        double t = 0;

        for (int i = 300; i < 512; i++)
        {

            t += data[i];
        }

        Debug.Message(""+d+" |||| "+m+" |||| "+t);
    }

Which returns:

Is this right? How do I get this data to something more usable?

回答1:

The coefficients you get out of a Fourier transform can be positive or negative - what you're interested in is the magnitude (ie. the amount of each frequency), so you will want to take the absolute value in your summation.

Also, I would recommend normalizing - at the end of your summation do this:

double total = data.Sum(x => Math.Abs(x));
d /= total;
m /= total;
t /= total;

This way, your numbers will be confined to the range [0-1) and you will get the same information out if the sound is quieter (unless you don't want that). Actually, the range will be somewhat less than that because each of your summations covers a smaller individual range. So you may want to scale them by the largest one of them:

double largest = Math.Max(d, m, t);
d /= largest;
m /= largest;
t /= largest;

Now the range of each should be between 0 and 1. You can then multiply by 255 or 256 and truncate the decimal if you like.

The downside of the last step is if the values are all zero (because the inputs were all zero) then you will divide by zero. Oops! At this point you need to decide exactly what you want.. If you don't do this scaling, then a sound which is entirely treble (according to your breakdown above) will have (0,0,1) for (d,m,t). But a sound which is an even mixture of the three will be (0.3333, 0.3333, 0.3333) for (d,m,t). And a sound which is completely quiet would be (0,0,0). If that's not what you want, well then you need to define exactly what you want before I could help you any further.

回答2:

Your dbValue is already a very good number, maesuring the level in decibel relative to 1.0 which becomes 0.0 dB

You should average instead of sum the individual (dB-Values at various) frequencies.

Then map the dB Range of about -80db .. 0.0dB to your color range.

Also note: Speach and music tend to have an average pink noise spectrum. This means that low frequencies tend to have higher dB than high frequencies. You should compensate for this effect (probably before averaging the frequencies) to get a "better" display.

来源：https://stackoverflow.com/questions/28183479/how-to-get-bass-mid-treble-data-from-fft

标签

fft

naudio