I got linearsvc working against training set and test set using load_file
method i am trying to get It working on Multiprocessor enviorment.
How can i g
I think using SGDClassifier instead of LinearSVC for this kind of data would be a good idea, as it is much faster. For the vectorization, I suggest you look into the hash transformer PR.
For the multiprocessing: You can distribute the data sets across cores, do partial_fit
, get the weight vectors, average them, distribute them to the estimators, do partial fit again.
Doing parallel gradient descent is an area of active research, so there is no ready-made solution there.
How many classes does your data have btw? For each class, a separate will be trained (automatically). If you have nearly as many classes as cores, it might be better and much easier to just do one class per core, by specifying n_jobs
in SGDClassifier.