Regular expression to extract chunks of text from a text file?
I need to extract headings and the chunk of text beneath them from a text file in Python using regular expression but I'm finding it difficult. I converted this PDF to text so that it now looks like this: So far I have been able to get all the numerical headers (12.4.5.4, 12.4.5.6, 13, 13.1, 13.1.1, 13.1.12) using the following regex: import re with open('data/single.txt', encoding='UTF-8') as file: for line in file: headings = re.findall(r'^\d+(?:\.\d+)*\.?', line) print(headings)` I just don't know how to get the worded part of those headings or the paragraph of text beneath them. EDIT -