There are only two options that do much better than O(N) on total bits:
Using specialty bit-scan instructions available in certain architectures like BSF in x86.
There are O(log2(N)) algorithms for finding the lowest bit set in a word. This, of course does not scale well when the bitset is dense, rather than sparse. Resurrecting some foggy memory of mine, I found the source in the FXT library Details can be found in the FXT book (pdf), in section 1.3.2.