python - Read file from and to specific lines of text

痴心易碎 提交于 2019-11-26 18:52:23

If you simply want the block of text between Start and End, you can do something simple like:

with open('test.txt') as input_data:
    # Skips text before the beginning of the interesting block:
    for line in input_data:
        if line.strip() == 'Start':  # Or whatever test is needed
            break
    # Reads text until the end of the block:
    for line in input_data:  # This keeps reading the file
        if line.strip() == 'End':
            break
        print line  # Line is extracted (or block_of_lines.append(line), etc.)

In fact, you do not need to manipulate line numbers in order to read the data between the Start and End markers.

The logic ("read until…") is repeated in both blocks, but it is quite clear and efficient (other methods typically involve checking some state [before block/within block/end of block reached], which incurs a time penalty).

orlp

Here's something that will work:

data_file = open("test.txt")
block = ""
found = False

for line in data_file:
    if found:
        block += line
        if line.strip() == "End": break
    else:
        if line.strip() == "Start":
            found = True
            block = "Start"

data_file.close()
pyInTheSky

You can use a regex pretty easily. You can make it more robust as needed, below is a simple example.

>>> import re
>>> START = "some"
>>> END = "Hello"
>>> test = "this is some\nsample text\nthat has the\nwords Hello World\n"
>>> m = re.compile(r'%s.*?%s' % (START,END), re.S)
>>> m.search(test).group(0)
'some\nsample text\nthat has the\nwords Hello'

This should be a start for you:

started = False
collected_lines = []
with open(path, "r") as fp:
     for i, line in enumerate(fp.readlines()):
         if line.rstrip() == "Start": 
             started = True
             print "started at line", i # counts from zero !
             continue
          if started and line.rstrip()=="End":
             print "end at line", i
             break
          # process line 
          collected_lines.append(line.rstrip())

The enumerate generator takes a generator and enumerates the iterations. Eg.

  print list(enumerate("a b c".split()))

prints

   [ (0, "a"), (1,"b"), (2, "c") ]

UPDATE:

the poster asked for using a regex to match lines like "===" and "======":

import re
print re.match("^=+$", "===")     is not None
print re.match("^=+$", "======")  is not None
print re.match("^=+$", "=")       is not None
print re.match("^=+$", "=abc")    is not None
print re.match("^=+$", "abc=")    is not None
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!