发表新帖

发表新帖

Emulation of lex like functionality in Perl or Python

前端未结

关注

 8  2114

梦毁少年i 2021-01-13 23:46

Here\'s the deal. Is there a way to have strings tokenized in a line based on multiple regexes?

One example:

I have to get all href tags, their corresponding

8条回答

清歌不尽 (楼主)

2021-01-14 00:13
Sounds like you really just want to parse HTML, I recommend looking at any of the wonderful packages for doing so:
- BeautifulSoup
- lxml.html
- html5lib
Or! You can use a parser like one of the following:
- PyParsing
- DParser - A GLR parser with good python bindings.
- ANTLR - A recursive decent parser generator that can generate python code.
This example is from the BeautifulSoup Documentation:
```
from BeautifulSoup import BeautifulSoup, SoupStrainer
import re

links = SoupStrainer('a')
[tag for tag in BeautifulSoup(doc, parseOnlyThese=links)]
# [success, 
#  experiments, 
#  BoogaBooga]

linksToBob = SoupStrainer('a', href=re.compile('bob.com/'))
[tag for tag in BeautifulSoup(doc, parseOnlyThese=linksToBob)]
# [success, 
#  experiments]
```
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题