Suppose you have N (~100k-1m) integers/bitstrings each K (e.g. 256) bits long. The algorithm should return the k pairs with the lowest pairwise Hamming dis
The recent paper "The Closest Pair Problem under the Hamming Metric" has only algorithms involving an n^2 factor (unless K is very large). That is even for finding only a single pair. So it seems that it is hard to improve this unless you make further assumptions about the structure of your instances. For example, if you assume the Hamming distance is not very large, you could sample a few columns, hash the strings into buckets according to these under the assumptions that these columns match exactly, and then do pairwise comparison in each bucket separately. Repeat this for another set of random columns to minimize the probability you miss some pairs.