Python regex alternation

前端 未结 1 1329
北荒
北荒 2020-12-22 01:01

I\'m trying to find all links on a webpage in the form of \"http://something\" or https://something. I made a regex and it works:

L         


        
相关标签:
1条回答
  • 2020-12-22 01:25

    You are using capturing groups, and .findall() alters behaviour when you use those (it'll only return the contents of capturing groups). Your regex can be simplified, but your versions will work if you use non-capturing groups instead:

    L = re.findall(r"(?:http|https)://[^/\"]+/", site_str)
    

    You don't need to escape the double quote if you use single quotes around the expression, and you only need to vary the s in the expression, so s? would work too:

    L = re.findall(r'https?://[^/"]+/', site_str)
    

    Demo:

    >>> import re
    >>> example = '''
    ... "http://someserver.com/"
    ... "https://anotherserver.com/with/path"
    ... '''
    >>> re.findall(r'https?://[^/"]+/', example)
    ['http://someserver.com/', 'https://anotherserver.com/']
    
    0 讨论(0)
提交回复
热议问题