Python regex look-behind requires fixed-width pattern

前端 未结 5 1689
甜味超标
甜味超标 2020-12-19 03:28

When trying to extract the title of a html-page I have always used the following regex:

(?<=)([\\s\\S]*)(?=)
<
5条回答
  •  -上瘾入骨i
    2020-12-19 04:02

    The regex for extracting the content of non-nested HTML/XML tags is actually very simple:

    r = re.compile(']*>(.*?)')
    

    However, for anything more complex, you should really use a proper DOM parser like urllib or BeautifulSoup.

提交回复
热议问题