Dumping JSON directly into a tarfile

萝らか妹 提交于 2021-01-29 07:01:24

问题


I have a large list of dict objects. I would like to store this list in a tar file to exchange remotely. I have done that successfully by writing a json.dumps() string to a tarfile object opened in 'w:gz' mode.

I am trying for a piped implementation, opening the tarfile object in 'w|gz' mode. Here is my code so far:

from json import dump
from io import StringIO
import tarfile

with StringIO() as out_stream, tarfile.open(filename, 'w|gz', out_stream) as tar_file:
    for packet in json_io_format(data):
        dump(packet, out_stream)

This code is in a function 'write_data'. 'json_io_format' is a generator that returns one dict object at a time from the dataset (so packet is a dict).

Here is my error:

Traceback (most recent call last):
  File "pdml_parser.py", line 35, in write_data
    dump(packet, out_stream)
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 2397, in __exit__
    self.close()
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 1733, in close
    self.fileobj.close()
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 459, in close
    self.fileobj.write(self.buf)
TypeError: string argument expected, got 'bytes'

After some troubleshooting with help from the comments, the error is caused when the 'with' statement exits, and tries to call the context manager __exit__. I BELIEVE that this in turn calls TarFile.close(). If I remove the tarfile.open() call from the 'with' statement, and purposefully leave out the TarFile.close(), I get this code:

with StringIO() as out_stream:
    tarfile.open(filename, 'w|gz', out_stream) as tar_file:
    for packet in json_io_format(data):
        dump(packet, out_stream)

This version of the program completes, but does not produce the output file 'filname' and yields this error:

Exception ignored in: <bound method _Stream.__del__ of <targile._Stream object at 0x7fca7a352b00>>
Traceback (most recent call last):
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 411, in __del__
    self.close()
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 459, in close
    self.fileobj.write(self.buf)
TypeError: string argument expected, got 'bytes'

I believe that is caused by the garbage collector. Something is preventing the TarFile object from closing.

Can anyone help me figure out what is going on here?


回答1:


Why do you think you can write a tarfile to a StringIO? That doesn't work like you think it does.

This approach doesn't error, but it's not actually how you create a tarfile in memory from in-memory objects.

from json import dumps                                                               
from io import BytesIO                                                     
import tarfile                                                                       

data = [{'foo': 'bar'},                                                              
        {'cheese': None},                                                            
        ]                                                                            

filename = 'fnord'                                                                   
with BytesIO() as out_stream, tarfile.open(filename, 'w|gz', out_stream) as tar_file:
    for packet in data:                                                              
        out_stream.write(dumps(packet).encode())                                     


来源:https://stackoverflow.com/questions/39109180/dumping-json-directly-into-a-tarfile

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!