Emulation of lex like functionality in Perl or Python

前端 未结 8 2114
梦毁少年i
梦毁少年i 2021-01-13 23:46

Here\'s the deal. Is there a way to have strings tokenized in a line based on multiple regexes?

One example:

I have to get all href tags, their corresponding

8条回答
  •  清歌不尽
    2021-01-14 00:13

    Sounds like you really just want to parse HTML, I recommend looking at any of the wonderful packages for doing so:

    • BeautifulSoup
    • lxml.html
    • html5lib

    Or! You can use a parser like one of the following:

    • PyParsing
    • DParser - A GLR parser with good python bindings.
    • ANTLR - A recursive decent parser generator that can generate python code.

    This example is from the BeautifulSoup Documentation:

    from BeautifulSoup import BeautifulSoup, SoupStrainer
    import re
    
    links = SoupStrainer('a')
    [tag for tag in BeautifulSoup(doc, parseOnlyThese=links)]
    # [success, 
    #  experiments, 
    #  BoogaBooga]
    
    linksToBob = SoupStrainer('a', href=re.compile('bob.com/'))
    [tag for tag in BeautifulSoup(doc, parseOnlyThese=linksToBob)]
    # [success, 
    #  experiments]
    

提交回复
热议问题