I did some timing tests and also read some articles like this one (last comment), and it looks like in Release build, float and double values take the same amount of process
I had a small project where I used CUDA and I can remember that float was faster than double there, too. For once the traffic between Host and Device is lower (Host is the CPU and the "normal" RAM and Device is the GPU and the corresponding RAM there). But even if the data resides on the Device all the time it's slower. I think I read somewhere that this has changed recently or is supposed to change with the next generation, but I'm not sure.
So it seems that the GPU simply can't handle double precision natively in those cases, which would also explain why GLFloat is usually used rather than GLDouble.
(As I said it's only as far as I can remember, just stumbled upon this while searching for float vs. double on a CPU.)