Python Regex doesn't work as expected

此生再无相见时 提交于 2019-12-02 03:14:31

Before the regex compiler sees a string, Python has already processed the slash-escapes, therefore you'd have to escape it twice (e.g. \\\\n for \\n). However, Python has a handy notation for exactly this sort of thing, just stick an r before the string:

regex = re.compile(r"""<entry>\\n<(\w+)>(.+?)</\w+>\\n</entry>""")

By the way, I agree with the others here, do not use regexes to parse XML. However, hopefully you will find this string notation helpful in future regular expressions.

You shouldn't parse XML with regex, instead you should use the Universal Feed Parser for Python. Using this library over regex will make your life easier and has been battle tested to be correct.

I personally have used this library many times, it works like a charm.

DON'T PARSE XML/HTML WITH REGEX!

Use one of the following:

Enjoy!

EDIT: Oh yeah it's RSS. What the other people said... I'll be here all week.

Do not try to reinvent wheels or playing the smart RSS parser guy. Reuse existing modules: http://www.feedparser.org/

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!