Remove specific lines from a large text file in python

前端 未结 3 1973
情歌与酒
情歌与酒 2020-12-17 07:01

I have several large text text files that all have the same structure and I want to delete the first 3 lines and then remove illegal characters from the 4th line. I don\'t w

相关标签:
3条回答
  • 2020-12-17 07:51

    As wim said in the comments, sed is the right tool for this. The following command should do what you want:

    sed -i -e '4 s/(dB)//' -e '4 s/Best Unit/Best_Unit/' -e '1,3 d' yourfile.whatever
    

    To explain the command a little:

    -i executes the command in place, that is it writes the output back into the input file

    -e execute a command

    '4 s/(dB)//' on line 4, subsitute '' for '(dB)'

    '4 s/Best Unit/Best_Unit/' same as above, except different find and replace strings

    '1,3 d' from line 1 to line 3 (inclusive) delete the entire line

    sed is a really powerful tool, which can do much more than just this, well worth learning.

    0 讨论(0)
  • 2020-12-17 07:52

    You can use file.readlines() with an aditional argument in order to read just a few first lines from the file. From the docs:

    f.readlines() returns a list containing all the lines of data in the file. If given an optional parameter sizehint, it reads that many bytes from the file and enough more to complete a line, and returns the lines from that. This is often used to allow efficient reading of a large file by lines, but without having to load the entire file in memory. Only complete lines will be returned.

    Then the most robust way to manipulate generic strings are Regular Expressions. In Python, this means the re module with, for example, the re.sub() function.

    My suggestion, which should be adapted to suit your needs:

    import re
    
    f = open('somefile.txt')
    line4 = f.readlines(100)[3]
    line4 = re.sub('\([^\)].*?\)', '', line4)
    line4 = re.sub('Best(\s.*?)', 'Best_', line4)
    newfilestring = ''.join(line4 + [line for line in f.readlines[4:]])
    f.close()
    newfile = open('someotherfile.txt', 'w')
    newfile.write(newfilestring)
    newfile.close()
    
    0 讨论(0)
  • 2020-12-17 07:54

    Just try it for each file. 100 MB per file is not that big, and as you can see, the code to just make an attempt is not time-consuming to write.

    with open('file.txt') as f:
      lines = f.readlines()
    lines[:] = lines[3:]
    lines[0] = lines[0].replace('Rx(db)', 'Rx')
    lines[0] = lines[0].replace('Best Unit', 'Best_Unit')
    with open('output.txt', 'w') as f:
      f.write('\n'.join(lines))
    
    0 讨论(0)
提交回复
热议问题