I use TBB in one project. It seemed to be easier to use it than threads.
There are tasks which can be run in parallel. A task is just a call to your parallelized subroutine. Load balancing is done automatically. That is why I accept it as a higher level parallelization library. I achieved 2.5x speed up without much work on a 4 core intel processor.
There are examples, they answer questions on forums and it is maintained and it is free.