Optimal way to cluster set of strings with hamming distance [duplicate]
问题 This question already has answers here : Fast computation of pairs with least hamming distance (1 answer) Finding Minimum hamming distance of a set of strings in python (4 answers) Closed 4 years ago . I have a database with n strings (n > 1 million), each string has 100 chars, each char is either a , b , c or d . I would like to find the closest strings for each one , closest defines as having the smallest hamming distance . I would like to find the k-nearest strings for each one (k < 5).