Scikit-learn: How do we define a distance metric's parameter for grid search

╄→гoц情女王★ 提交于 2021-02-07 09:45:48

问题


I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics.

# Define the parameter values that should be searched
k_range    = range(1,31)
weights    = ['uniform' , 'distance']
algos      = ['auto', 'ball_tree', 'kd_tree', 'brute']
leaf_sizes = range(10, 60, 10)    
metrics = ["euclidean", "manhattan", "chebyshev", "minkowski", "mahalanobis"]

param_grid = dict(n_neighbors = list(k_range), weights = weights, algorithm = algos, leaf_size = list(leaf_sizes), metric=metrics)
param_grid

# Instantiate the algorithm
knn = KNeighborsClassifier(n_neighbors=10)

# Instantiate the grid
grid = GridSearchCV(knn, param_grid=param_grid, cv=10, scoring='accuracy', n_jobs=-1)

# Fit the models using the grid parameters
grid.fit(X,y)

I assume this is because I have to set or define the ranges for the various distance parameters (for example p, w for “wminkowski” - WMinkowskiDistance ). The "minkowski" distance may be working because its "p" parameter has the default 2.

So my questions are:

  1. Can we set the range of parameters for the distance metrics for the grid search and if so how?
  2. Can we set the value of a parameters for the distance metrics for the grid search and if so how?

Hope the question is clear. TIA


回答1:


I finally got the answer with the help from the Scikit user and developer mailing list. I am placing here what I learned in the hopes that it will help other too.

The answer to the two questions above is: yes. This is the example code I got from the mailing list:

params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},
          {'kernel':['rbf'],'gamma':[1/p,1,2]},
          {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1]}]

Two things to note:

  1. You can list a set of parameters, for each set you are free to place only what is required for the group of parameters. This means we can select the metric and the corresponding parameters. The parameters are named by using the keys.

  2. For each of the keys we can use a list of values - each combination of these values will be use by the grid search and passed on to the corresponding metric function.

This still leaves us with an issues: how do we pass the combination of parameters to the metric. Note: not all metrics can be used by an algorithm, so you have to set these manually.

I now show the example I requested above:

{'metric': ['wminkowski'], 
                     'metric_params':[
                                {'w':np.array([2.0] * len(X.columns)),'p':1.0},   # L1
                                {'w':np.array([2.0] * len(X.columns)),'p':1.5},
                                {'w':np.array([2.0] * len(X.columns)),'p':2.0},   # L2
                                {'w':np.array([2.0] * len(X.columns)),'p':2.5},
                                {'w':np.array([2.0] * len(X.columns)),'p':3.5},
                                {'w':np.array([2.0] * len(X.columns)),'p':3.0}
                               ], 
                     'algorithm': ['brute', 'ball_tree'], 
                     'n_neighbors': list(k_range), 'weights': weights, 'leaf_size': list(leaf_sizes) } 

Note the following:

  1. 'wminkowski' only works with the ['brute', 'ball_tree'] algorithms.
  2. We must use a list of dictionaries in 'metric_params' in order to enumerate all the possible combinations of parameters (I have not found way to automate this).
  3. In the case above I was forced to use a numpy array because the conversion was not made implicitly (otherwise we get an exception)

I anyone know of a better way of doing this, please comment.



来源:https://stackoverflow.com/questions/37924606/scikit-learn-how-do-we-define-a-distance-metrics-parameter-for-grid-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!