I have a text file that contains the following contents. I want to split this file into multiple files (1.txt, 2.txt, 3.txt...). Each a new output file will be as the follow
try re.findall() function:
import re
with open('input.txt', 'r') as f:
data = f.read()
found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
[open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]
Minimalistic approach for the first 3 occurrences:
import re
found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)
[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]
Some explanations:
found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
will find all occurrences matching the specified RegEx and will put them into the list, called found
[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]
iterate (using list comprehensions) through all elements belonging to found list and for each element create text file (which is called like "index of the element + 1.txt") and write that element (occurrence) to that file.
Another version, without RegEx's:
blocks_to_read = 3
blk_begin = 'A'
blk_end = '$$'
with open('35916503.txt', 'r') as f:
fn = 1
data = []
write_block = False
for line in f:
if fn > blocks_to_read:
break
line = line.strip()
if line == blk_begin:
write_block = True
if write_block:
data.append(line)
if line == blk_end:
write_block = False
with open(str(fn) + '.txt', 'w') as fout:
fout.write('\n'.join(data))
data = []
fn += 1
PS i, personally, don't like this version and i would use the one using RegEx