Memory Error at Python while converting to array

我们两清 提交于 2019-12-05 03:40:25

问题


My code is shown below:

from sklearn.datasets import load_svmlight_files
import numpy as np

perm1 =np.random.permutation(25000)
perm2 = np.random.permutation(25000)

X_tr, y_tr, X_te, y_te = load_svmlight_files(("dir/file.feat", "dir/file.feat"))

#randomly shuffle data
X_train = X_tr[perm1,:].toarray()[:,0:2000]
y_train = y_tr[perm1]>5 #turn into binary problem

The code works fine until here, but when I try to convert one more object to an array, my program returns a memory error.

Code:

X_test = X_te[perm2,:].toarray()[:,0:2000]

Error:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-7-31f5e4f6b00c> in <module>()
----> 1 X_test = X_test.toarray()

C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\compressed.pyc in toarray(self, order, out)
    788     def toarray(self, order=None, out=None):
    789         """See the docstring for `spmatrix.toarray`."""
--> 790         return self.tocoo(copy=False).toarray(order=order, out=out)
    791 
    792     ##############################################################

C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\coo.pyc in toarray(self, order, out)
    237     def toarray(self, order=None, out=None):
    238         """See the docstring for `spmatrix.toarray`."""
--> 239         B = self._process_toarray_args(order, out)
    240         fortran = int(B.flags.f_contiguous)
    241         if not fortran and not B.flags.c_contiguous:

C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\base.pyc in _process_toarray_args(self, order, out)
    697             return out
    698         else:
--> 699             return np.zeros(self.shape, dtype=self.dtype, order=order)
    700 
    701 

MemoryError: 

I'm new in python, and I dont know whether one needs to manually fix the memory error.

Other parts of my code return the same errors (like training with knn or ann).

How can I fix this?


回答1:


In cases like these, it's often possible to avoid converting your sparse matrices to dense format.

For example, you can do the permutation and slice easily with CSR or CSC sparse formats.

You haven't posted the code that follows, but I suspect that can be made to handle sparse inputs as well. If that's true, your memory issues will no longer be a problem.




回答2:


Use numpy.asarray() in-place conversion instead of toarray() which requires new memory.



来源:https://stackoverflow.com/questions/23879139/memory-error-at-python-while-converting-to-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!