scikit-learn | 易学教程

Output 50 samples closest to each cluster center using scikit-learn.k-means library

阅读更多关于 Output 50 samples closest to each cluster center using scikit-learn.k-means library

问题 I have fitted a k-means algorithm on 5000+ samples using the python scikit-learn library. I want to have the 50 samples closest to a cluster center as an output. How do I perform this task? 回答1: If km is the k-means model, the distance to the j 'th centroid for each point in an array X is d = km.transform(X)[:, j] This gives an array of len(X) distances. The indices of the 50 closest to centroid j are ind = np.argsort(d)[::-1][:50] so the 50 points closest to the centroids are X[ind] (or use

ValueError: illegal value in 4-th argument of internal None when running sklearn LinearRegression().fit()

阅读更多关于 ValueError: illegal value in 4-th argument of internal None when running sklearn LinearRegression().fit()

问题 For some reason I cannot get this block of code to run properly anymore: import numpy as np from sklearn.linear_model import LinearRegression # Create linear data with some noise x = np.random.uniform(0, 100, 1000) y = 2. * x + 3. + np.random.normal(0, 10, len(x)) # Fit linear data with sklearn LinearRegression lm = LinearRegression() lm.fit(x.reshape(-1, 1), y) Traceback (most recent call last): File "<input>", line 2, in <module> File "C:\Python37\lib\site-packages\sklearn\linear_model\

How to use scikit's preprocessing/normalization along with cross validation?

阅读更多关于 How to use scikit's preprocessing/normalization along with cross validation?

问题 As an example of cross-validation without any preprocessing, I can do something like this: tuned_params = [{"penalty" : ["l2", "l1"]}] from sklearn.linear_model import SGDClassifier SGD = SGDClassifier() from sklearn.grid_search import GridSearchCV clf = GridSearchCV(myClassifier, params, verbose=5) clf.fit(x_train, y_train) I would like to preprocess my data using something like from sklearn import preprocessing x_scaled = preprocessing.scale(x_train) But it would not be a good idea to do

How to use scikit's preprocessing/normalization along with cross validation?

阅读更多关于 How to use scikit's preprocessing/normalization along with cross validation?

How to use scikit's preprocessing/normalization along with cross validation?

阅读更多关于 How to use scikit's preprocessing/normalization along with cross validation?

How to use scikit's preprocessing/normalization along with cross validation?

阅读更多关于 How to use scikit's preprocessing/normalization along with cross validation?

How to speed up sklearn SVR?

阅读更多关于 How to speed up sklearn SVR?

问题 I am implementing SVR using sklearn svr package in python. My sparse matrix is of size 146860 x 10202. I have divided it into various sub-matrices of size 2500 x 10202. For each sub matrix, SVR fitting is taking about 10 mins. What could be the ways to speed up the process? Please suggest any different approach or different python package for the same. Thanks! 回答1: You can average the SVR sub-models predictions. Alternatively you can try to fit a linear regression model on the output of

How To Calculate F1-Score For Multilabel Classification?

阅读更多关于 How To Calculate F1-Score For Multilabel Classification?

问题 I try to calculate the f1_score but I get some warnings for some cases when I use the sklearn f1_score method. I have a multilabel 5 classes problem for a prediction. import numpy as np from sklearn.metrics import f1_score y_true = np.zeros((1,5)) y_true[0,0] = 1 # => label = [[1, 0, 0, 0, 0]] y_pred = np.zeros((1,5)) y_pred[:] = 1 # => prediction = [[1, 1, 1, 1, 1]] result_1 = f1_score(y_true=y_true, y_pred=y_pred, labels=None, average="weighted") print(result_1) # prints 1.0 result_2 = f1

How To Calculate F1-Score For Multilabel Classification?

阅读更多关于 How To Calculate F1-Score For Multilabel Classification?

Calculate weighted pairwise distance matrix in Python

阅读更多关于 Calculate weighted pairwise distance matrix in Python

问题 I am trying to find the fastest way to perform the following pairwise distance calculation in Python. I want to use the distances to rank a list_of_objects by their similarity. Each item in the list_of_objects is characterised by four measurements a, b, c, d, which are made on very different scales e.g.: object_1 = [0.2, 4.5, 198, 0.003] object_2 = [0.3, 2.0, 999, 0.001] object_3 = [0.1, 9.2, 321, 0.023] list_of_objects = [object_1, object_2, object_3] The aim is to get a pairwise distance