If one intends to follow c-based path, it is a good idea to add the compiler flag -msse4.2 to your makefile. This allows the compiler to generate hardware based popcnt instructions instead of using a table to generate the popcount. On my system this was approximately 2.5x faster.