问题
I want to make a fair comparison between different machine learning models. However, I find that the ridge regression model will automatically use multiple processors and there is no parameter that I can restrict the number of used processors (such as n_jobs). Is there any possible way to solve this problem?
A minimal example:
from sklearn.datasets import make_regression
from sklearn.linear_model import RidgeCV
features, target = make_regression(n_samples=10000, n_features=1000)
r = RidgeCV()
r.fit(features, target)
print(r.score(features, target))
回答1:
If you set the environmental variable OMP_NUM_THREADS to n, you will get the expected behaviour. E.g. on linux, do export OMP_NUM_THREADS=1 in the terminal to restrict the use to 1 cpu.
Depending on your system, you can also set it directly in python. See e.g. How to set environment variables in Python?
回答2:
Here it is try to take a look here sklearn.utils.parallel_backend i think you can set up the number of cores for calculation using the njobs parameter.
回答3:
Trying to expand further on @PV8 answer, what happens whenever you instantiate an instance of RidgeCV() without explicitly setting cv parameter (as in your case) is that an Efficient Leave One Out cross-validation is run (according to the algorithms referenced here, implementation here).
On the other side, when explicitly passing cv parameter to RidgeCV() this happens:
model = Ridge()
parameters = {'alpha': [0.1, 1.0, 10.0]}
gs = GridSearchCV(model, param_grid=parameters)
gs.fit(features, target)
print(gs.best_score_)
(as you can see here), namely that you'll use GridSearchCV with default n_jobs=None.
Most importantly, as pointed out by one of sklearn core-dev here, the issue you are experimenting might be not dependent on sklearn, but rather on
[...] your numpy setup performing vectorized operations with parallelism.
(where vectorized operations are performed within the computationally efficient LOO cross-validation procedure that you are implicitly calling by not passing cv to RidgeCV()).
回答4:
Based on the docs for RidgeCV:
Ridge regression with built-in cross-validation.
By default, it performs Leave-One-Out Cross-Validation, which is a form of efficient Leave-One-Out cross-validation.
And by default you use None - to use the efficient Leave-One-Out cross-validation.
An alternate approach with ridge regression and cross validation:
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Ridge
clf = Ridge(a)
scores = cross_val_score(clf, features, target, cv=1, n_jobs=1)
print(scores)
See also the docs of Ridge and cross_val_score.
来源:https://stackoverflow.com/questions/65377950/how-do-i-restrict-the-number-of-processors-used-by-the-ridge-regression-model-in