MemoryError in Python but not IPython

寵の児 提交于 2019-12-13 00:03:51

问题


Generally-can you think of any reason why this would happen (i.e. a MemoryError in Python but not in IPython (console--not notebook)?)

To be more specific, I'm using sklearn's sgdclassifier in the multiclass and multilabel case. It errors given the following code:

model = SGDClassifier(
    loss='hinge', 
    penalty='l2', 
    n_iter=niter, 
    alpha=alpha, 
    fit_intercept=True,
    n_jobs=1)

mc = OneVsRestClassifier(model)
mc.fit(X, y)

On calling mc.fit(X, y), the following error occurs:

 File "train12-3b.py", line 411, in buildmodel
    mc.fit(X, y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/multiclass.py", line 201, in fit
    n_jobs=self.n_jobs)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/multiclass.py", line 88, in fit_ovr
    Y = lb.fit_transform(y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 408, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/label.py", line 272, in transform
    neg_label=self.neg_label)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/label.py", line 394, in label_binarize
    Y = np.zeros((len(y), len(classes)), dtype=np.int)
MemoryError

Y is a matrix with 6 million rows and k columns, where the gold labels are 1 and the rest are 0 (in this case, k = 21, but I'd like to go >2000). Y gets converted by sklearn to a dense matrix (hence Y = np.zeros((len(y), len(classes)), dtype=np.int) MemoryError ), even if it is passed in as sparse.

I have 60 gb of ram, and with 21 columns, it shouldn't take more than 8 gb max (6 million * 21 * 64), so I'm confused. I rewrote the Y = np.zeros((len(y), len(classes)), dtype=np.int to use dtype = bool, but no luck.

Any thoughts?


回答1:


It sounds like you are hitting a limitation of the current implementation of the label binarizer: see issue #2441. There is PR #2458 to fix it.

Please feel free to try that branch and report your results as a comment to that PR.



来源:https://stackoverflow.com/questions/20382599/memoryerror-in-python-but-not-ipython

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!