Parsing a large .bz2 file (40 GB) with lxml iterparse in python. Error that does not appear with uncompressed file

后端 未结 2 1385
孤城傲影
孤城傲影 2020-12-18 00:49

I am trying to parse OpenStreetMap\'s planet.osm, compressed in bz2 format. Because it is already 41G, I don\'t want to decompress the file completely.

So I figured

2条回答
  •  攒了一身酷
    2020-12-18 01:12

    As an alternative you can use the output of bzcat command (which can handle multistream files too):

    p = subprocess.Popen(["bzcat", "data.bz2"], stdout=subprocess.PIPE)
    parser = et.iterparse(p.stdout, ...)
    # at the end just check that p.returncode == 0 so there were no errors
    

提交回复
热议问题