I have been looking through this fantastic article: http://blogs.zynaptiq.com/bernsee/pitch-shifting-using-the-ft/
While being fantastic, it is extremely hard and he
Finally I have figured this out; really I had to derive it from scratch. I knew there would be some simple way to derive it, my (usual) mistake was to attempt to follow other people's logic rather than use my own common sense.
This puzzle takes two keys to unlock it.
The first key is to understand how over-sampling introduces a rotation on bin phase.
The second key comes from Graph 3.3 & 3.4 here: http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/
...
for (int k = 0; k <= fftFrameSize/2; k++)
{
// compute magnitude and phase
bins[k].mag = 2.*sqrt(fftBins[k].real*fftBins[k].real + fftBins[k].imag*fftBins[k].imag);
bins[k].phase = atan2(fftBins[k].imag, fftBins[k].real);
// Compute phase difference Δϕ fo bin[k]
double deltaPhase;
{
double measuredPhaseDiff = bins[k].phase - gLastPhase[k];
gLastPhase[k] = bins[k].phase;
// Subtract expected phase difference <-- FIRST KEY
// Think of a single wave in a 1024 float frame, with osamp = 4
// if the first sample catches it at phase = 0, the next will
// catch it at pi/2 ie 1/4 * 2pi
double binPhaseExpectedDiscrepancy = M_TWOPI * (double)k / (double)osamp;
deltaPhase = measuredPhaseDiff - binPhaseExpectedDiscrepancy;
// Wrap delta phase into [-Pi, Pi) interval
deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);
}
// say sampleRate = 40K samps/sec, fftFrameSize = 1024 samps in FFT giving bin[0] thru bin[512]
// then bin[1] holds one whole wave in the frame, ie 44 waves in 1s ie 44Hz ie sampleRate / fftFrameSize
double bin0Freq = (double)sampleRate / (double)fftFrameSize;
bins[k].idealFreq = (double)k * bin0Freq;
// Consider Δϕ for bin[k] between hops.
// write as 2π / m.
// so after m hops, Δϕ = 2π, ie 1 extra cycle has occurred <-- SECOND KEY
double m = M_TWOPI / deltaPhase;
// so, m hops should have bin[k].idealFreq * t_mHops cycles. plus this extra 1.
//
// bin[k].idealFreq * t_mHops + 1 cycles in t_mHops seconds
// => bins[k].actualFreq = bin[k].idealFreq + 1 / t_mHops
double tFrame = fftFrameSize / sampleRate;
double tHop = tFrame / osamp;
double t_mHops = m * tHop;
bins[k].freq = bins[k].idealFreq + 1. / t_mHops;
}