How to match a paragraph using regex

前端 未结 5 2035
误落风尘
误落风尘 2020-12-14 13:11

I have been struggling with python regex for a while trying to match paragraphs within a text, but I haven\'t been successful. I need to obtain the start and end positions o

5条回答
  •  再見小時候
    2020-12-14 13:32

    Using split is one way, you can do so with regular expression also like this:

    paragraphs = re.search('(.+?\n\n|.+?$)',TEXT,re.DOTALL)
    

    The .+? is a lazy match, it will match the shortest substring that makes the whole regex matched. Otherwise, it will just match the whole string.

    So basically here we want to find a sequence of characters (.+?) which ends by a blank line (\n\n) or the end of string ($). The re.DOTALL flag makes the dot to match newline also (we also want to match a paragraph consisting of three lines without blank lines within)

提交回复
热议问题