问题
I have a large list of dict objects. I would like to store this list in a tar file to exchange remotely. I have done that successfully by writing a json.dumps() string to a tarfile object opened in 'w:gz' mode.
I am trying for a piped implementation, opening the tarfile object in 'w|gz' mode. Here is my code so far:
from json import dump
from io import StringIO
import tarfile
with StringIO() as out_stream, tarfile.open(filename, 'w|gz', out_stream) as tar_file:
for packet in json_io_format(data):
dump(packet, out_stream)
This code is in a function 'write_data'. 'json_io_format' is a generator that returns one dict object at a time from the dataset (so packet is a dict).
Here is my error:
Traceback (most recent call last):
File "pdml_parser.py", line 35, in write_data
dump(packet, out_stream)
File "/.../anaconda3/lib/python3.5/tarfile.py", line 2397, in __exit__
self.close()
File "/.../anaconda3/lib/python3.5/tarfile.py", line 1733, in close
self.fileobj.close()
File "/.../anaconda3/lib/python3.5/tarfile.py", line 459, in close
self.fileobj.write(self.buf)
TypeError: string argument expected, got 'bytes'
After some troubleshooting with help from the comments, the error is caused when the 'with' statement exits, and tries to call the context manager __exit__. I BELIEVE that this in turn calls TarFile.close(). If I remove the tarfile.open() call from the 'with' statement, and purposefully leave out the TarFile.close(), I get this code:
with StringIO() as out_stream:
tarfile.open(filename, 'w|gz', out_stream) as tar_file:
for packet in json_io_format(data):
dump(packet, out_stream)
This version of the program completes, but does not produce the output file 'filname' and yields this error:
Exception ignored in: <bound method _Stream.__del__ of <targile._Stream object at 0x7fca7a352b00>>
Traceback (most recent call last):
File "/.../anaconda3/lib/python3.5/tarfile.py", line 411, in __del__
self.close()
File "/.../anaconda3/lib/python3.5/tarfile.py", line 459, in close
self.fileobj.write(self.buf)
TypeError: string argument expected, got 'bytes'
I believe that is caused by the garbage collector. Something is preventing the TarFile object from closing.
Can anyone help me figure out what is going on here?
回答1:
Why do you think you can write a tarfile to a StringIO? That doesn't work like you think it does.
This approach doesn't error, but it's not actually how you create a tarfile in memory from in-memory objects.
from json import dumps
from io import BytesIO
import tarfile
data = [{'foo': 'bar'},
{'cheese': None},
]
filename = 'fnord'
with BytesIO() as out_stream, tarfile.open(filename, 'w|gz', out_stream) as tar_file:
for packet in data:
out_stream.write(dumps(packet).encode())
来源:https://stackoverflow.com/questions/39109180/dumping-json-directly-into-a-tarfile