Multiprocessing scikit-learn

前端 未结 2 758
一生所求
一生所求 2020-12-13 16:04

I got linearsvc working against training set and test set using load_file method i am trying to get It working on Multiprocessor enviorment.

How can i g

2条回答
  •  既然无缘
    2020-12-13 17:09

    I think using SGDClassifier instead of LinearSVC for this kind of data would be a good idea, as it is much faster. For the vectorization, I suggest you look into the hash transformer PR.

    For the multiprocessing: You can distribute the data sets across cores, do partial_fit, get the weight vectors, average them, distribute them to the estimators, do partial fit again.

    Doing parallel gradient descent is an area of active research, so there is no ready-made solution there.

    How many classes does your data have btw? For each class, a separate will be trained (automatically). If you have nearly as many classes as cores, it might be better and much easier to just do one class per core, by specifying n_jobs in SGDClassifier.

提交回复
热议问题