sklearn and large datasets

前端 未结 4 1461
无人及你
无人及你 2021-01-30 09:11

I have a dataset of 22 GB. I would like to process it on my laptop. Of course I can\'t load it in memory.

I use a lot sklearn but for much smaller datasets.

In

4条回答
  •  感动是毒
    2021-01-30 09:30

    I've used several scikit-learn classifiers with out-of-core capabilities to train linear models: Stochastic Gradient, Perceptron and Passive Agressive and also Multinomial Naive Bayes on a Kaggle dataset of over 30Gb. All these classifiers share the partial_fit method which you mention. Some behave better than others though.

    You can find the methodology, the case study and some good resources in of this post: http://www.opendatascience.com/blog/riding-on-large-data-with-scikit-learn/

提交回复
热议问题