Longest Prefix Matches for URLs

前端未结

关注

 4  2032

醉话见心 2020-12-13 01:21

I need information about any standard python package which can be used for \"longest prefix match\" on URLs. I have gone through the two standard packages http://packages.py

4条回答

感动是毒 (楼主)

2020-12-13 02:03
If you are willing to trade RAM for the time performance then SuffixTree might be useful. It has nice algorithmic properties such as it allows to solve the longest common substring problem in a linear time.

If you always search for a prefix rather than an arbitrary substring then you could add a unique prefix while populating SubstringDict():
```
from SuffixTree import SubstringDict

substr_dict = SubstringDict()
for url in URLS: # urls must be ascii (valid urls are)
    assert '\n' not in url
    substr_dict['\n'+url] = url #NOTE: assume that '\n' can't be in a url

def longest_match(url_prefix, _substr_dict=substr_dict):
    matches = _substr_dict['\n'+url_prefix]
    return max(matches, key=len) if matches else ''
```
Such usage of SuffixTree seems suboptimal but it is 20-150 times faster (without SubstringDict()'s construction time) than @StephenPaulger's solution [which is based on .startswith()] on the data I've tried and it could be good enough.

To install SuffixTree, run:
```
pip install SuffixTree -f https://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...