Python regex look-behind requires fixed-width pattern

前端 未结 5 1687
甜味超标
甜味超标 2020-12-19 03:28

When trying to extract the title of a html-page I have always used the following regex:

(?<=)([\\s\\S]*)(?=)
<
5条回答
  •  离开以前
    2020-12-19 04:01

    Toss out the idea of parsing HTML with regular expressions and use an actual HTML parsing library instead. After a quick search I found this one. It's a much safer way to extract information from an HTML file.

    Remember, HTML is not a regular language so regular expressions are fundamentally the wrong tool for extracting information from it.

提交回复
热议问题