Edit: The code here still has some bugs in it, and it could do better in the performance department, but instead of trying to fix this, for t
Just a note about this (I wanted to make a comment but apparently new users aren't allowed to comment): Using reinterpret_cast on references produces incorrect code with gcc 4.1 -O3. This seems to be fixed in 4.4 because there it works. Changing the reinterpret_casts to pointers, while slightly uglier, works in both cases.