Convert MNIST data from numpy arrays to original ubyte data

问题

I used this code almost exactly, just changing the line:

f = gzip.open("../data/mnist.pkl.gz", 'rb')
training_data, validation_data, test_data = cPickle.load(f)

to these lines:

import pickle as cPickle
f = gzip.open("mnist.pkl.gz", 'rb')
u = cPickle._Unpickler(f)
u.encoding='latin1'
training_data, validation_data, test_data = u.load()

to account for pickling issues.The original mnist.pkl.gz was downloaded from his repo (available here), or the code to generate the .pkl.gz is here. The output is great, it's a pickled numpy array of the training and test data, and on inspection, I can see if I print the length of the training data, it's 250,000 numpy arrays.

I need to get the data back into the exact format as the original MNIST data (i.e. ubyte, training and testing data and labels separate) to be put into an external pipeline that i have no control over, so it must be the same as the original.

I'm really stuck on how to do this. I can see for example things like this that might help, but I can't see how it suits this problem. If someone could help me revert the output from this pickled numpy arrays to the original MNIST format (i.e. ubyte, training and testing data and labels separate), i'd really appreciate it.

Edit 1: Something I've just realised that might be easier, I actually only need to convert the training data into ubyte format, not the testing one, since I already have the testing data in ubyte format in the original.

来源：https://stackoverflow.com/questions/65156592/convert-mnist-data-from-numpy-arrays-to-original-ubyte-data

标签

python

numpy

mnist

idx