I'm not doing numerical computing but I work with data mining (think clustering and classification), and our workloads are probably similar: all the data is static and you have it at the beginning of the program. I have briefly investigated Intel's TBB and found them overkill for my needs. After starting with raw pthread-based code, I switched to OPENMP and got the right mix between readability and performance.