Save Numpy Array using Pickle

╄→尐↘猪︶ㄣ 提交于 2020-08-24 05:35:15

问题


I've got a Numpy array that I would like to save (130,000 x 3) that I would like to save using Pickle, with the following code. However, I keep getting the error "EOFError: Ran out of input" or "UnsupportedOperation: read" at the pkl.load line. This is my first time using Pickle, any ideas?

Thanks,

Anant

import pickle as pkl
import numpy as np

arrayInput = np.zeros((1000,2)) #Trial input
save = True
load = True

filename = path + 'CNN_Input'
fileObject = open(fileName, 'wb')

if save:
    pkl.dump(arrayInput, fileObject)
    fileObject.close()

if load:
    fileObject2 = open(fileName, 'wb')
    modelInput = pkl.load(fileObject2)
    fileObject2.close()

if arrayInput == modelInput:
    Print(True)

回答1:


You should use numpy.save and numpy.load.




回答2:


I have no problems using pickle:

In [126]: arr = np.zeros((1000,2))
In [127]: with open('test.pkl','wb') as f:
     ...:     pickle.dump(arr, f)
     ...:     
In [128]: with open('test.pkl','rb') as f:
     ...:     x = pickle.load(f)
     ...:     print(x.shape)
     ...:     
     ...:     
(1000, 2)

pickle and np.save/load have a deep reciprocity. Like I can load this pickle with np.load:

In [129]: np.load('test.pkl').shape
Out[129]: (1000, 2)

If I open the pickle file in the wrong I do get your error:

In [130]: with open('test.pkl','wb') as f:
     ...:     x = pickle.load(f)
     ...:     print(x.shape)
     ...:    
UnsupportedOperation: read

But that shouldn't be surprising - you can't read a freshly opened write file. It will be empty.

np.save/load is the usual pair for writing numpy arrays. But pickle uses save to serialize arrays, and save uses pickle to serialize non-array objects (in the array). Resulting file sizes are similar. Curiously in timings the pickle version is faster.




回答3:


It's been a bit but if you're finding this, Pickle completes in a fraction of the time.

with open('filename','wb') as f: pickle.dump(arrayname, f)

with open('filename','rb') as f: arrayname1 = pickle.load(f)

numpy.array_equal(arrayname,arrayname1) #sanity check

On the other hand, by default numpy compress took my 5.2GB down to .4GB and Pickle went to 1.7GB.




回答4:


You should use numpy.save() for saving numpy matrices.




回答5:


In your code, you're using

if load:
    fileObject2 = open(fileName, 'wb')
    modelInput = pkl.load(fileObject2)
    fileObject2.close()

The second argument in the open function is the method. w stands for writing, r for reading. The second character b denotes that bytes will be read/written. A file that will be written to cannot be read and vice versa. Therefore, opening the file with fileObject2 = open(fileName, 'rb') will do the trick.




回答6:


Don't use pickle for numpy arrays, for an extended discussion that links to all resources I could find see my answer here.

Short reasons:

  • there is already a nice interface the developers of numpy made and will save you lots of time of debugging (most important reason)
  • np.save,np.load,np.savez have pretty good performance in most metrics, see this, which is to be expected since it's an established library and the developers of numpy made those functions.
  • Pickle executes arbitrary code and is a security issue
  • to use pickle you would have to open and file and might get issues that leads to bugs (e.g. I wasn't aware of using b and it stopped working, took time to debug)
  • if you refuse to accept this advice, at least really articulate the reason you need to use something else. Make sure it's crystal clear in your head.

Avoid repeating code at all costs if a solution already exists!

Anyway, here are all the interfaces I tried, hopefully it saves someone time (probably my future self):

import numpy as np
import pickle
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

# using save (to npy), savez (to npz)
np.save(path/'x', x)
np.save(path/'y', y)
np.savez(path/'db', x=x, y=y)
with open(path/'db.pkl', 'wb') as db_file:
    pickle.dump(obj={'x':x, 'y':y}, file=db_file)

## using loading npy, npz files
x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')
db = np.load(path/'db.npz')
with open(path/'db.pkl', 'rb') as db_file:
    db_pkl = pickle.load(db_file)

print(x is x_loaded)
print(x == x_loaded)
print(x == db['x'])
print(x == db_pkl['x'])
print('done')

but most useful see my answer here.



来源:https://stackoverflow.com/questions/52444921/save-numpy-array-using-pickle

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!