How do I restrict the number of processors used by the ridge regression model in sklearn?

坚强是说给别人听的谎言 提交于 2021-01-28 03:21:59

问题


I want to make a fair comparison between different machine learning models. However, I find that the ridge regression model will automatically use multiple processors and there is no parameter that I can restrict the number of used processors (such as n_jobs). Is there any possible way to solve this problem?

A minimal example:

from sklearn.datasets import make_regression
from sklearn.linear_model import RidgeCV

features, target = make_regression(n_samples=10000, n_features=1000)
r = RidgeCV()
r.fit(features, target)
print(r.score(features, target))

回答1:


If you set the environmental variable OMP_NUM_THREADS to n, you will get the expected behaviour. E.g. on linux, do export OMP_NUM_THREADS=1 in the terminal to restrict the use to 1 cpu.

Depending on your system, you can also set it directly in python. See e.g. How to set environment variables in Python?




回答2:


Here it is try to take a look here sklearn.utils.parallel_backend i think you can set up the number of cores for calculation using the njobs parameter.




回答3:


Trying to expand further on @PV8 answer, what happens whenever you instantiate an instance of RidgeCV() without explicitly setting cv parameter (as in your case) is that an Efficient Leave One Out cross-validation is run (according to the algorithms referenced here, implementation here).

On the other side, when explicitly passing cv parameter to RidgeCV() this happens:

  model = Ridge()
  parameters = {'alpha': [0.1, 1.0, 10.0]}
  gs = GridSearchCV(model, param_grid=parameters)
  gs.fit(features, target)
  print(gs.best_score_)

(as you can see here), namely that you'll use GridSearchCV with default n_jobs=None.

Most importantly, as pointed out by one of sklearn core-dev here, the issue you are experimenting might be not dependent on sklearn, but rather on

[...] your numpy setup performing vectorized operations with parallelism.

(where vectorized operations are performed within the computationally efficient LOO cross-validation procedure that you are implicitly calling by not passing cv to RidgeCV()).




回答4:


Based on the docs for RidgeCV:

Ridge regression with built-in cross-validation.

By default, it performs Leave-One-Out Cross-Validation, which is a form of efficient Leave-One-Out cross-validation.

And by default you use None - to use the efficient Leave-One-Out cross-validation.

An alternate approach with ridge regression and cross validation:

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Ridge
clf = Ridge(a)
scores = cross_val_score(clf, features, target, cv=1, n_jobs=1)
print(scores)

See also the docs of Ridge and cross_val_score.



来源:https://stackoverflow.com/questions/65377950/how-do-i-restrict-the-number-of-processors-used-by-the-ridge-regression-model-in

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!