I've a really big json object that I want to dump into a pickle file. Is there a way to display a progress bar while using pickle.dump
?
The only way that I know of is to define getstate/setstate methods to return "sub objects" which can refresh the GUI when the get pickled/unpickled. For example, if your object is a list, you could use something like this:
import pickle
class SubList:
on_pickling = None
def __init__(self, sublist):
print('SubList', sublist)
self.data = sublist
def __getstate__(self):
if SubList.on_pickling is not None:
print('SubList pickle state fetch: calling sub callback')
SubList.on_pickling()
return self.data
def __setstate__(self, obj):
if SubList.on_pickling is not None:
print('SubList pickle state restore: calling sub callback')
SubList.on_pickling()
self.data = obj
class ListSubPickler:
def __init__(self, data: list):
self.data = data
def __getstate__(self):
print('creating SubLists for pickling long list')
num_chunks = 10
span = int(len(self.data) / num_chunks)
SubLists = [SubList(self.data[i:(i + span)]) for i in range(0, len(self.data), span)]
return SubLists
def __setstate__(self, subpickles):
self.data = []
print('restoring Pickleable(list)')
for subpickle in subpickles:
self.data.extend(subpickle.data)
print('final', self.data)
def refresh():
# do something: refresh GUI (for example, qApp.processEvents() for Qt), show progress, etc
print('refreshed')
If you run the following in that script,
data = list(range(100)) # your large data object
list_pickler = ListSubPickler(data)
SubList.on_pickling = refresh
print('\ndumping pickle of', list_pickler)
pickled = pickle.dumps(list_pickler)
print('\nloading from pickle')
new_list_pickler = pickle.loads(pickled)
assert new_list_pickler.data == data
print('\nloading from pickle, without on_pickling')
SubList.on_pickling = None
new_list_pickler = pickle.loads(pickled)
assert new_list_pickler.data == data
You will see that the refresh callback gets called 10 times. So if you have 2GB list to dump, and it takes 1 minute to dump, so you'd want roughly 60*10 = 600 GUI refreshes, you would then set your number of chunks to be 600.
Code is easily modified for a dict, numpy array, etc.
来源:https://stackoverflow.com/questions/30611840/pickle-dump-with-progress-bar