Efficient way of calculating likeness scores of strings when sample size is large?

前端未结

关注

 8  940

轻奢々 2020-12-25 15:10

Let\'s say that you have a list of 10,000 email addresses, and you\'d like to find what some of the closest \"neighbors\" in this list are - defined as email addresses that

8条回答

爱一瞬间的悲伤 (楼主)

2020-12-25 16:07
I don't think you can do better than O(n^2) but you can do some smaller optimizations which could be just enough of a speedup to make your application usable:
- You could first sort all email addresses by th part after the @ and only compare addresses where that is the same
- You can stop calculating the distance between two addresses when it becomes bigger than n
EDIT: Actually you can do better than O(n^2), just look at Nick Johnsons answer below.
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...