Part 2 of a successful outcome regarding white-space filling

▼魔方 西西 提交于 2019-12-11 05:40:07

问题


So, my first question was answered correctly. For reference you can go here...

How to fill the white-space with info while leaving the rest unchanged?

In short, I needed this...

POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0


POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0

To become this...

BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0
END_POLY

Which was succesfully accomplished with a python script. Now I have found that I need to remove duplicate lines, specifically the last line from each block. That line closes the polygon but the building batch gives an error because it closes the polygon on it's own. Basically I need it to be this at the end of it all...

BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
END_POLY

and there are 3,415,978 lines to go through. Every other duplicate remover takes away the white space and all the wording. Hmmm


回答1:


As pointed out in the comments, keep a reference to the previous line:

with open('in.txt') as fin, open('out.txt', 'w') as fout:
    prev = None
    for i, line in enumerate(fin):
      if line.strip() != 'END_POLY' and prev:
        fout.write(prev)
      prev = line
      if not i % 10000:
        print('Processing line {}'.format(i))
    fout.write(line)



回答2:


Although not in python, these types of editing is quite straighforward if you use sed

sed 'N;s/.*\n\(END_POLY\)/\1/' file.txt

Basically what it does is that it uses N to read 2 lines at a time, if the second line contains the string END_POLY, it removes the first line, leaving only END_POLY




回答3:


if you don't want duplicated data, you can trasform the list into set, then into list (taking the @Jean-François Fabre code from the other question an little modific):

import itertools, collections

with open("file.txt") as f, open("fileout.txt","w") as fw:
    fw.writelines(itertools.chain.from_iterable([["BEGIN_POLYGON\n"]+list(collections.OrderedDict.fromkeys(v).keys())+["END_POLYGON\n"] for k,v in itertools.groupby(f,key = lambda l : bool(l.strip())) if k]))

as you can see, if you do:

print(list(collections.OrderedDict.fromkeys([1,1,1,1,1,1,2,2,2,2,5,3,3,3,3,3]).keys()))

it will be -> [1, 2, 5, 3] and yo preserve the order



来源:https://stackoverflow.com/questions/46976157/part-2-of-a-successful-outcome-regarding-white-space-filling

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!