For a project I need to be able to generate a spectrogram from a .WAV file. I\'ve read the following should be done:
The code you produced, was almost correct. So, you didn't left me much to correct:
void Spectrogram::process(){
int transform_size = 1024;
int half = transform_size/2;
int step_size = transform_size/2;
double in[transform_size];
double processed[half];
fftw_complex *out;
fftw_plan p;
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
for (int x=0; x < wavFile->getSamples()/step_size; x++) {
// Fill the transformation array with a sample frame and apply the window function.
// Normalization is performed later
// (One error was here: you didn't set the last value of the array in)
for (int j = 0, int i = x * step_size; i < x * step_size + transform_size; i++, j++)
in[j] = wavFile->getSample(i) * windowHanning(j, transform_size);
p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE);
fftw_execute(p); /* repeat as needed */
for (int i=0; i < half; i++) {
// (Here were some flaws concerning the access of the complex values)
out[i][0] *= (2./transform_size); // real values
out[i][1] *= (2./transform_size); // complex values
processed[i] = out[i][0]*out[i][0] + out[i][1]*out[i][1]; // power spectrum
processed[i] = 10./log(10.) * log(processed[i] + 1e-6); // dB
// The resulting spectral values in 'processed' are in dB and related to a maximum
// value of about 96dB. Normalization to a value range between 0 and 1 can be done
// in several ways. I would suggest to set values below 0dB to 0dB and divide by 96dB:
// Transform all dB values to a range between 0 and 1:
if (processed[i] <= 0) {
processed[i] = 0;
} else {
processed[i] /= 96.; // Reduce the divisor if you prefer darker peaks
if (processed[i] > 1)
processed[i] = 1;
}
In->setPixel(x,(half-1)-i,processed[i]*255);
}
// This should be called each time fftw_plan_dft_r2c_1d()
// was called to avoid a memory leak:
fftw_destroy_plan(p);
}
fftw_free(out);
}
The two corrected bugs were most probably responsible for the slight variation of successive transformation results. The Hanning window is very vell suited to minimize the "noise" so a different window would not have solved the problem (actually @Alex I already pointed to the 2nd bug in his point 2. But in his point 3. he added a -Inf-bug as log(0) is not defined which can happen if your wave file containts a stretch of exact 0-values. To avoid this the constant 1e-6 is good enough).
Not asked, but there are some optimizations:
put p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE); outside the main loop,
precalculate the window function outside the main loop,
abandon the array processed and just use a temporary variable to hold one spectral line at a time,
the two multiplications of out[i][0] and out[i][1] can be abandoned in favour of one multiplication with a constant in the following line. I left this (and other things) for you to improve
Thanks to @Maxime Coorevits additionally a memory leak could be avoided: "Each time you call fftw_plan_dft_rc2_1d() memory are allocated by FFTW3. In your code, you only call fftw_destroy_plan() outside the outer loop. But in fact, you need to call this each time you request a plan."