I have a very big file, like this:
[PATTERN1] line1 line2 line3 ... ... [END PATTERN] [PATTERN2] line1 line2 ... ... [END PATTERN]
I need to extract
Use something like
import re
START_PATTERN = '^START-PATTERN$'
END_PATTERN = '^END-PATTERN$'
with open('myfile') as file:
match = False
newfile = None
for line in file:
if re.match(START_PATTERN, line):
match = True
newfile = open('my_new_file.txt', 'w')
continue
elif re.match(END_PATTERN, line):
match = False
newfile.close()
continue
elif match:
newfile.write(line)
newfile.write('\n')
This will iterate the file without reading it all into memory. It also writes directly to your new file, rather than appending to a list in memory. If your source is large enough that too may become an issue.
Obviously there are numerous modifications you may need to make to this code; perhaps a regex pattern is not required to match a start/end line, in which case replace it with something like if 'xyz' in line
.