Python to extract data from a file

前端 未结 4 1604
暖寄归人
暖寄归人 2020-12-06 16:09

I am trying to extract the text between that has specific text file:

----
data1
data1
data1
extractme
----
data2
data2
data2
----
data3
data3
extractme
---         


        
相关标签:
4条回答
  • 2020-12-06 16:20

    This works well enough for me. Your sample data is in a file called "data.txt" and the output goes to "result.txt"

    inFile = open("data.txt")
    outFile = open("result.txt", "w")
    buffer = []
    keepCurrentSet = True
    for line in inFile:
        buffer.append(line)
        if line.startswith("----"):
            #---- starts a new data set
            if keepCurrentSet:
                outFile.write("".join(buffer))
            #now reset our state
            keepCurrentSet = False
            buffer = []
        elif line.startswith("extractme"):
            keepCurrentSet = True
    inFile.close()
    outFile.close()
    
    0 讨论(0)
  • 2020-12-06 16:34

    For Python2

    #!/usr/bin/env python
    
    with open("infile.txt") as infile:
        with open("outfile.txt","w") as outfile:
            collector = []
            for line in infile:
                if line.startswith("----"):
                    collector = []
                collector.append(line)
                if line.startswith("extractme"):
                    for outline in collector:
                        outfile.write(outline)
    

    For Python3

    #!/usr/bin/env python3
    
    with open("infile.txt") as infile, open("outfile.txt","w") as outfile:
        collector = []
        for line in infile:
            if line.startswith("----"):
                collector = []
            collector.append(line)
            if line.startswith("extractme"):
                for outline in collector:
                    outfile.write(outline)
    
    0 讨论(0)
  • 2020-12-06 16:35

    I imagine the change in number of dashes (4 in the input, sometimes 4 and sometimes 3 in the output) is an error and not actually desired (since no algorithm is even hinted at, to explain how many dashes are to be output on different occasions).

    I would structure the task in terms of reading and yielding one block of lines at a time:

    def readbyblock(f):
      while True:
          block = []
          for line in f:
              if line = '----\n': break
              block.append(line)
          if not block: break
          yield block
    

    so that the (selective) output can be neatly separated from the input:

    with open('infile.txt') as fin:
        with open('oufile.txt', 'w') as fou:
            for block in readbyblock(fin):
                if 'extractme\n' in block:
                    fou.writelines(block)
                    fou.write('----\n')
    

    This is not optimal, performance-wise, if the blocks are large, since it has a separate loop on all lines in the block implied in the if clause. So, a good refactoring might be:

    def selectivereadbyblock(f, marker='extractme\n'):
      while True:
          block = []
          extract = False
          for line in f:
              if line = '----\n': break
              block.append(line)
              if line==marker: extract = True
          if not block: break
          if extract: yield block
    
    with open('infile.txt') as fin:
        with open('oufile.txt', 'w') as fou:
            for block in selectivereadbyblock(fin):
                fou.writelines(block)
                fou.write('----\n')
    

    Parameterizing the separators (now hard-coded as '----\n' for both input and output) is another reasonable coding tweak.

    0 讨论(0)
  • 2020-12-06 16:42
    data=open("file").read().split("----")
    print '----'.join([ i for i in data if "extractme" in i ])
    
    0 讨论(0)
提交回复
热议问题