Edit: The code here still has some bugs in it, and it could do better in the performance department, but instead of trying to fix this, for t
I strongly doubt that you get the correct values in fetch_and_add etc, as float addition is different from int addition.
Here's what I get from these arithmetics:
1 + 1 = 1.70141e+038
100 + 1 = -1.46937e-037
100 + 0.01 = 1.56743e+038
23 + 42 = -1.31655e-036
So yeah, threadsafe but not what you expect.
the lock-free algorithms (operator + etc.) should work regarding atomicity (haven't checked for the algorithm itself..)
Other solution: As it is all additions and subtractions, you might be able to give every thread its own instance, then add the results from multiple threads.