发表新帖

发表新帖

Python regex look-behind requires fixed-width pattern

前端未结

关注

 5  1696

甜味超标 2020-12-19 03:28

When trying to extract the title of a html-page I have always used the following regex:

(?<=)([\\s\\S]*)(?=)

<

5条回答

-上瘾入骨i (楼主)

2020-12-19 04:02
The regex for extracting the content of non-nested HTML/XML tags is actually very simple:
```
r = re.compile(']*>(.*?)')
```
However, for anything more complex, you should really use a proper DOM parser like urllib or BeautifulSoup.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题