I am interested in determining the musical key of an audio sample. How would (or could) an algorithm go about trying to approximate the key of a musical audio sample?
<
You can use the Fourier Transform to calculate the frequency spectrum from an audio sample. From this output, you can use the frequency values for particular notes to turn this into a list of notes heard during the sample. Choosing the strongest notes heard per sample over a series of samples should give you a decent map of the different notes used, which you can compare to the different musical scales to get a list of the possible scales that contain that combination of notes.
To help decide which particular scale is being used, make a note (no pun intended) of the most frequently heard notes. In Western music, the root of the scale is typically the most common note heard, followed by the fifth, and then the fourth. You can also look for patterns such as common chords, arpeggios, or progressions.
Sample size will probably be important here. Ideally, each sample will be a single note (so that you don't get two chords in one sample). If you filter out and concentrate on the low frequencies, you may be able to use the volume spikes ("clicks") normally associated with percussion instruments in order to determine the song's tempo and "lock" your algorithm to the beat of the music. Start with samples that are a half-beat in length and adjust from there. Be prepared to throw out some samples that don't have a lot of useful data (such as a sample taken in the middle of a slide).