I have a dataset of 22 GB. I would like to process it on my laptop. Of course I can\'t load it in memory.
I use a lot sklearn but for much smaller datasets.
In
I've used several scikit-learn classifiers with out-of-core capabilities to train linear models: Stochastic Gradient, Perceptron and Passive Agressive and also Multinomial Naive Bayes on a Kaggle dataset of over 30Gb. All these classifiers share the partial_fit method which you mention. Some behave better than others though.
You can find the methodology, the case study and some good resources in of this post: http://www.opendatascience.com/blog/riding-on-large-data-with-scikit-learn/