This is by no means an easy problem. I'll try to give you an overview only.
What you could do is something like the following:
- Compute the average (root-mean-square) loudness of the signal over blocks of, say, 5 milliseconds. (Having never done this before, I don't know what a good block size would be.)
- Take the Fourier transform of the "blocked" signal, using the FFT algorithm.
- Find the component in the transformed signal that has the largest magnitude.
A Fourier transform is basically a way of computing the strength of all frequencies present in the signal. If you do that over the "blocked" signal, the frequency of the beat will hopefully be the strongest one.
Maybe you need to apply a filter first, to focus on specific frequencies (like the bass) that usually contain the most information about the BPM.