I have a program that currently takes way too long to sum up large std::vectors of ~100 million elements using std::accumulat
You can use Boost Asio as a thread pool. But there's not a lot of sense in it unless you have... asynchronous IO operations to coordinate.
In this answer to "c++ work queues with blocking" I show two thread_pool implementations:
boost::asio::io_serviceboost::thread primitivesBoth accept any void() signature compatible task. This means, you could wrap your function-that-returns-the-important-results in a packaged_task<...> and get the future from it.