问题
So, my first question was answered correctly. For reference you can go here...
How to fill the white-space with info while leaving the rest unchanged?
In short, I needed this...
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0
To become this...
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0
END_POLY
Which was succesfully accomplished with a python script. Now I have found that I need to remove duplicate lines, specifically the last line from each block. That line closes the polygon but the building batch gives an error because it closes the polygon on it's own. Basically I need it to be this at the end of it all...
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
END_POLY
and there are 3,415,978 lines to go through. Every other duplicate remover takes away the white space and all the wording. Hmmm
回答1:
As pointed out in the comments, keep a reference to the previous line:
with open('in.txt') as fin, open('out.txt', 'w') as fout:
prev = None
for i, line in enumerate(fin):
if line.strip() != 'END_POLY' and prev:
fout.write(prev)
prev = line
if not i % 10000:
print('Processing line {}'.format(i))
fout.write(line)
回答2:
Although not in python, these types of editing is quite straighforward if you use sed
sed 'N;s/.*\n\(END_POLY\)/\1/' file.txt
Basically what it does is that it uses N
to read 2 lines at a time, if the second line contains the string END_POLY
, it removes the first line, leaving only END_POLY
回答3:
if you don't want duplicated data, you can trasform the list into set, then into list (taking the @Jean-François Fabre code from the other question an little modific):
import itertools, collections
with open("file.txt") as f, open("fileout.txt","w") as fw:
fw.writelines(itertools.chain.from_iterable([["BEGIN_POLYGON\n"]+list(collections.OrderedDict.fromkeys(v).keys())+["END_POLYGON\n"] for k,v in itertools.groupby(f,key = lambda l : bool(l.strip())) if k]))
as you can see, if you do:
print(list(collections.OrderedDict.fromkeys([1,1,1,1,1,1,2,2,2,2,5,3,3,3,3,3]).keys()))
it will be -> [1, 2, 5, 3]
and yo preserve the order
来源:https://stackoverflow.com/questions/46976157/part-2-of-a-successful-outcome-regarding-white-space-filling