Python Regex - Extract text between (multiple) expressions in a textfile

孤者浪人 提交于 2019-12-02 03:44:23

You may use

regex = r"(?:{})[\s\S]*?(?:{}).*(?:\n.*){{0,2}}".format(openings, closings)

This pattern will result in a regex like

(?:dear|to our|estimated)[\s\S]*?(?:sincerely|yours|best regards).*(?:\n.*){0,2}

See the regex demo. Note you should not use re.DOTALL with this pattern, and the re.MULTILINE option is also redundant.

Details

  • (?:dear|to our|estimated) - any of the three values
  • [\s\S]*? - any 0+ chars, as few as possible
  • (?:sincerely|yours|best regards) - any of the three values
  • .* - any 0+ chars other than newline
  • (?:\n.*){0,2} - zero, one or two repetitions of a newline followed with any 0+ chars other than newline.

Python demo code:

import re
text="""Some random text here

Dear Shareholders We
are pleased to provide you with this semiannual report for Fund for the six-month period ended April 30, 2018. For additional information about the Fund, please visit our website a, where you can access quarterly commentaries. We value the trust that you place in us and look forward to serving your investment needs in the years to come.
Best regards 
Douglas

Random text here as well"""
letter_begin = ["dear", "to our", "estimated"] # All expressions for "beginning" of letter 
openings = "|".join(letter_begin)
letter_end = ["sincerely", "yours", "best regards"] # All expressions for "ending" of Letter 
closings = "|".join(letter_end)
regex = r"(?:{})[\s\S]*?(?:{}).*(?:\n.*){{0,2}}".format(openings, closings)
print(regex)
print(re.findall(regex, text, re.IGNORECASE))

Output:

['Dear Shareholders We\nare pleased to provide you with this semiannual report for Fund for the six-month period ended April 30, 2018. For additional information about the Fund, please visit our website a, where you can access quarterly commentaries. We value the trust that you place in us and look forward to serving your investment needs in the years to come.\nBest regards \nDouglas\n']
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!