Remove specific lines from a large text file in python

前端未结

关注

 3  1973

I have several large text text files that all have the same structure and I want to delete the first 3 lines and then remove illegal characters from the 4th line. I don\'t w

相关标签:

3条回答

一个人的身影

2020-12-17 07:51
As wim said in the comments, sed is the right tool for this. The following command should do what you want:
```
sed -i -e '4 s/(dB)//' -e '4 s/Best Unit/Best_Unit/' -e '1,3 d' yourfile.whatever
```
To explain the command a little:

-i executes the command in place, that is it writes the output back into the input file

-e execute a command

'4 s/(dB)//' on line 4, subsitute '' for '(dB)'

'4 s/Best Unit/Best_Unit/' same as above, except different find and replace strings

'1,3 d' from line 1 to line 3 (inclusive) delete the entire line

sed is a really powerful tool, which can do much more than just this, well worth learning.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野的像风

2020-12-17 07:52
You can use file.readlines() with an aditional argument in order to read just a few first lines from the file. From the docs:

f.readlines() returns a list containing all the lines of data in the file. If given an optional parameter sizehint, it reads that many bytes from the file and enough more to complete a line, and returns the lines from that. This is often used to allow efficient reading of a large file by lines, but without having to load the entire file in memory. Only complete lines will be returned.

Then the most robust way to manipulate generic strings are Regular Expressions. In Python, this means the re module with, for example, the re.sub() function.

My suggestion, which should be adapted to suit your needs:
```
import re

f = open('somefile.txt')
line4 = f.readlines(100)[3]
line4 = re.sub('\([^\)].*?\)', '', line4)
line4 = re.sub('Best(\s.*?)', 'Best_', line4)
newfilestring = ''.join(line4 + [line for line in f.readlines[4:]])
f.close()
newfile = open('someotherfile.txt', 'w')
newfile.write(newfilestring)
newfile.close()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

时光取名叫无心

2020-12-17 07:54

Just try it for each file. 100 MB per file is not that big, and as you can see, the code to just make an attempt is not time-consuming to write.

with open('file.txt') as f:
  lines = f.readlines()
lines[:] = lines[3:]
lines[0] = lines[0].replace('Rx(db)', 'Rx')
lines[0] = lines[0].replace('Best Unit', 'Best_Unit')
with open('output.txt', 'w') as f:
  f.write('\n'.join(lines))

0 讨论(0)