I am interested in determining the musical key of an audio sample. How would (or could) an algorithm go about trying to approximate the key of a musical audio sample?
<First you need a pitch detection algorithm (e.g. autocorrelation).
You can use then your pitch detection algorithm to extract the pitch over a number of short time windows. After that you would need to see which musical key the sampled pitches fit best with.
As far as I can tell from this article, various keys each have their own common frequencies, so it likely analyzes the audio sample to detect what the most common notes and chords are. After all, you can have multiple keys that have the same configuration of sharps and flats, with the difference being the note that the key starts on and thus the chords that such keys, so it seems how often the significant notes and chords appear would be the only real way you could figure that sort of thing out. I don't really think you can get a layman's explanation of the actual mathematical formulas without leaving out a lot of information.
Do note that this is coming from somebody who has absolutely no experience in this area, with his first exposure being the article linked in this answer.