I\'ve got a Numpy array that I would like to save (130,000 x 3) that I would like to save using Pickle, with the following code. However, I keep getting the error \"EOFError: Ra
I have no problems using pickle
:
In [126]: arr = np.zeros((1000,2))
In [127]: with open('test.pkl','wb') as f:
...: pickle.dump(arr, f)
...:
In [128]: with open('test.pkl','rb') as f:
...: x = pickle.load(f)
...: print(x.shape)
...:
...:
(1000, 2)
pickle
and np.save/load
have a deep reciprocity. Like I can load this pickle with np.load
:
In [129]: np.load('test.pkl').shape
Out[129]: (1000, 2)
If I open the pickle file in the wrong I do get your error:
In [130]: with open('test.pkl','wb') as f:
...: x = pickle.load(f)
...: print(x.shape)
...:
UnsupportedOperation: read
But that shouldn't be surprising - you can't read a freshly opened write file. It will be empty.
np.save/load
is the usual pair for writing numpy arrays. But pickle uses save
to serialize arrays, and save
uses pickle to serialize non-array objects (in the array). Resulting file sizes are similar. Curiously in timings the pickle version is faster.
Don't use pickle for numpy arrays, for an extended discussion that links to all resources I could find see my answer here.
Short reasons:
np.save,np.load,np.savez
have pretty good performance in most metrics, see this, which is to be expected since it's an established library and the developers of numpy made those functions.b
and it stopped working, took time to debug)Avoid repeating code at all costs if a solution already exists!
Anyway, here are all the interfaces I tried, hopefully it saves someone time (probably my future self):
import numpy as np
import pickle
from pathlib import Path
path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)
lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2
# using save (to npy), savez (to npz)
np.save(path/'x', x)
np.save(path/'y', y)
np.savez(path/'db', x=x, y=y)
with open(path/'db.pkl', 'wb') as db_file:
pickle.dump(obj={'x':x, 'y':y}, file=db_file)
## using loading npy, npz files
x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')
db = np.load(path/'db.npz')
with open(path/'db.pkl', 'rb') as db_file:
db_pkl = pickle.load(db_file)
print(x is x_loaded)
print(x == x_loaded)
print(x == db['x'])
print(x == db_pkl['x'])
print('done')
but most useful see my answer here.
It's been a bit but if you're finding this, Pickle completes in a fraction of the time.
with open('filename','wb') as f: pickle.dump(arrayname, f)
with open('filename','rb') as f: arrayname1 = pickle.load(f)
numpy.array_equal(arrayname,arrayname1) #sanity check
On the other hand, by default numpy compress took my 5.2GB down to .4GB and Pickle went to 1.7GB.
In your code, you're using
if load:
fileObject2 = open(fileName, 'wb')
modelInput = pkl.load(fileObject2)
fileObject2.close()
The second argument in the open
function is the method. w
stands for writing, r
for reading. The second character b
denotes that bytes will be read/written. A file that will be written to cannot be read and vice versa. Therefore, opening the file with fileObject2 = open(fileName, 'rb')
will do the trick.
You should use numpy.save and numpy.load.
You should use numpy.save()
for saving numpy matrices.