I am trying to read sections of a file into numpy arrays that have similar start and stop flags for the different sections of the file. At the moment I have found a method
Let's say this is your file to read:
**starting**
blabla
blabla
**starting**
bleble
bleble
**starting**
bumbum
bumbum
This is code of the program:
file = open("testfile.txt", "r")
data = file.read()
file.close
data = data.split("**starting**")
print(data)
And this is output:
['', '\nblabla\nblabla\n', '\nbleble\nbleble\n', '\nbumbum\nbumbum']
Later you can del
empty element, or do other operation in your data
. split
function is buildin for string
objects and can get more complicated strings as arguments.
You have indentation problem, your code should look like this:
with open("myFile.txt") as f:
array = []
parsing = False
for line in f:
if line.startswith('stop flag'):
parsing = False
if parsing:
#do things to the data
if line.startswith('start flag'):
parsing = True
You can use itertools.takewhile each time you reach the start flag to take until the stop:
from itertools import takewhile
with open("myFile.txt") as f:
array = []
for line in f:
if line.startswith('start flag'):
data = takewhile(lambda x: not x.startswith("stop flag"),f)
# use data and repeat
Or just use an inner loop:
with open("myFile.txt") as f:
array = []
for line in f:
if line.startswith('start flag'):
# beginning of section use first lin
for line in f:
# check for end of section breaking if we find the stop lone
if line.startswith("stop flag"):
break
# else process lines from section
A file object returns its own iterator so the pointer will keep moving as you iterate over f
, when you reach the start flag, start processing a section until you hit the stop. There is no reason to re-open the file at all, just use the sections as you iterate once over the lines of the file. If the start and stop flag lines are considered part of the section make sure to also use those too.
The solution similar to yours would be:
result = []
parse = False
with open("myFile.txt") as f:
for line in f:
if line.startswith('stop flag'):
parse = False
elif line.startswith('start flag'):
parse = True
elif parse:
result.append(line)
else: # not needed, but I like to always add else clause
continue
print result
But you might also use inner loop or itertools.takewhile
as other answers suggest. Especially using takewhile
should be significantly faster for really big files.