re.findall() where I want all unique instances of the regex on the page

拟墨画扇 提交于 2021-01-27 16:17:38

问题


As the title suggests, I want to run code like this (top_url_list is just a list of urls I'm looping through to find instances of these filename conventions that I'm looking for with regex:

    name_files = []
    for i in top_url_list:
        result = re.findall("\/([a-z]+[0-9][0-9]\W[a-z]+)", str(urlopen(i).read()))

Where the objective is to grab all of the instances where the regex checks out, hence the 'findall()" function. The problem is, it's important that I only get distinct/uniques of each instance. Is this possible?


回答1:


re.findall() gives non-overlapping matches of pattern in string, as a list of strings. You can convert it into unique values using set(). Sample example regarding how set() works:

>>> my_list = [1, 5, 2, 5, 2, 7]
>>> set(my_list)
set([1, 2, 5, 7])  # Duplicate entries of 5 and 2 are removed


来源:https://stackoverflow.com/questions/40165530/re-findall-where-i-want-all-unique-instances-of-the-regex-on-the-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!