OpenMP.
It handles threads for you so you only worry about which parts of your C++ application you want to run in parallel.
eg.
#pragma omp parallel for
for (int i=0; i < SIZE; i++)
{
// do something with an element
}
the above code will run the for loop on as many threads as you've told the openmp runtime to use, so if SIZE is 100, and you have a quad-core box, that for loop will run 25 items on each core.
There are a few other parallel extensions for various languages, but the ones I'm most interested in are the ones that run on your graphics card. That's real parallel processing :) (examples: GPU++ and libSh)