Fast disk-based hashtables?
问题 I have sets of hashes (first 64 bits of MD5, so they're distributed very randomly) and I want to be able to see if a new hash is in a set, and to add it to a set. Sets aren't too big, the largest will be millions of elements, but there are hundreds of sets, so I cannot hold them all in memory. Some ideas I had so far: I tried just keeping it all in sqlite table, but it becomes really really slow once it cannot fit everything in memory. Bloom filters sound like they would have very high error