Python MemoryError when doing fitting with Scikit-learn

后端未结

关注

 2  640

天命终不由人 2020-12-15 07:45

I am running Python 2.7 (64-bit) on a Windows 8 64-bit system with 24GB memory. When doing the fitting of the usual Sklearn.linear_models.Ridge, the code runs f

2条回答

一向 (楼主)

2020-12-15 08:29

The relevant option here is gcv_mode. It can take 3 values: "auto", "svd" and "eigen". By default, it is set to "auto", which has the following behavior: use the svd mode if n_samples > n_features, otherwise use the eigen mode.

Since in your case n_samples > n_features, the svd mode is chosen. However, the svd mode currently doesn't handle sparse data properly. scikit-learn should be fixed to use proper sparse SVD instead of the dense SVD.

As a workaround, I would force the eigen mode by gcv_mode="eigen", since this mode should properly handle sparse data. However, n_samples is quite large in your case. Since the eigen mode builds a kernel matrix (and thus has n_samples ** 2 memory complexity), the kernel matrix may not fit in memory. In that case, I would just reduce the number of samples (the eigen mode can handle very large number of features without problem, though).

In any case, since both n_samples and n_features are quite large, you are pushing this implementation to its limits (even with a proper sparse SVD).

Also see https://github.com/scikit-learn/scikit-learn/issues/1921

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...