Efficient way of calculating likeness scores of strings when sample size is large?

前端 未结 8 938
轻奢々
轻奢々 2020-12-25 15:10

Let\'s say that you have a list of 10,000 email addresses, and you\'d like to find what some of the closest \"neighbors\" in this list are - defined as email addresses that

8条回答
  •  Happy的楠姐
    2020-12-25 15:42

    Yup - you can find all strings within a given distance of a string in O(log n) time by using a BK-Tree. Alternate solutions involving generating every string with distance n may be faster for a levenshtein distance of 1, but the amount of work rapidly balloons out of control for longer distances.

提交回复
热议问题