scikit-learn

Output 50 samples closest to each cluster center using scikit-learn.k-means library

徘徊边缘 提交于 2021-02-07 06:28:17
问题 I have fitted a k-means algorithm on 5000+ samples using the python scikit-learn library. I want to have the 50 samples closest to a cluster center as an output. How do I perform this task? 回答1: If km is the k-means model, the distance to the j 'th centroid for each point in an array X is d = km.transform(X)[:, j] This gives an array of len(X) distances. The indices of the 50 closest to centroid j are ind = np.argsort(d)[::-1][:50] so the 50 points closest to the centroids are X[ind] (or use

ValueError: illegal value in 4-th argument of internal None when running sklearn LinearRegression().fit()

风格不统一 提交于 2021-02-07 06:25:06
问题 For some reason I cannot get this block of code to run properly anymore: import numpy as np from sklearn.linear_model import LinearRegression # Create linear data with some noise x = np.random.uniform(0, 100, 1000) y = 2. * x + 3. + np.random.normal(0, 10, len(x)) # Fit linear data with sklearn LinearRegression lm = LinearRegression() lm.fit(x.reshape(-1, 1), y) Traceback (most recent call last): File "<input>", line 2, in <module> File "C:\Python37\lib\site-packages\sklearn\linear_model\

How to use scikit's preprocessing/normalization along with cross validation?

不羁岁月 提交于 2021-02-07 05:12:10
问题 As an example of cross-validation without any preprocessing, I can do something like this: tuned_params = [{"penalty" : ["l2", "l1"]}] from sklearn.linear_model import SGDClassifier SGD = SGDClassifier() from sklearn.grid_search import GridSearchCV clf = GridSearchCV(myClassifier, params, verbose=5) clf.fit(x_train, y_train) I would like to preprocess my data using something like from sklearn import preprocessing x_scaled = preprocessing.scale(x_train) But it would not be a good idea to do

How to use scikit's preprocessing/normalization along with cross validation?

橙三吉。 提交于 2021-02-07 05:10:43
问题 As an example of cross-validation without any preprocessing, I can do something like this: tuned_params = [{"penalty" : ["l2", "l1"]}] from sklearn.linear_model import SGDClassifier SGD = SGDClassifier() from sklearn.grid_search import GridSearchCV clf = GridSearchCV(myClassifier, params, verbose=5) clf.fit(x_train, y_train) I would like to preprocess my data using something like from sklearn import preprocessing x_scaled = preprocessing.scale(x_train) But it would not be a good idea to do

How to use scikit's preprocessing/normalization along with cross validation?

谁说胖子不能爱 提交于 2021-02-07 05:09:11
问题 As an example of cross-validation without any preprocessing, I can do something like this: tuned_params = [{"penalty" : ["l2", "l1"]}] from sklearn.linear_model import SGDClassifier SGD = SGDClassifier() from sklearn.grid_search import GridSearchCV clf = GridSearchCV(myClassifier, params, verbose=5) clf.fit(x_train, y_train) I would like to preprocess my data using something like from sklearn import preprocessing x_scaled = preprocessing.scale(x_train) But it would not be a good idea to do

How to use scikit's preprocessing/normalization along with cross validation?

大城市里の小女人 提交于 2021-02-07 05:09:00
问题 As an example of cross-validation without any preprocessing, I can do something like this: tuned_params = [{"penalty" : ["l2", "l1"]}] from sklearn.linear_model import SGDClassifier SGD = SGDClassifier() from sklearn.grid_search import GridSearchCV clf = GridSearchCV(myClassifier, params, verbose=5) clf.fit(x_train, y_train) I would like to preprocess my data using something like from sklearn import preprocessing x_scaled = preprocessing.scale(x_train) But it would not be a good idea to do

How to speed up sklearn SVR?

淺唱寂寞╮ 提交于 2021-02-07 03:21:35
问题 I am implementing SVR using sklearn svr package in python. My sparse matrix is of size 146860 x 10202. I have divided it into various sub-matrices of size 2500 x 10202. For each sub matrix, SVR fitting is taking about 10 mins. What could be the ways to speed up the process? Please suggest any different approach or different python package for the same. Thanks! 回答1: You can average the SVR sub-models predictions. Alternatively you can try to fit a linear regression model on the output of

How To Calculate F1-Score For Multilabel Classification?

我怕爱的太早我们不能终老 提交于 2021-02-07 03:16:02
问题 I try to calculate the f1_score but I get some warnings for some cases when I use the sklearn f1_score method. I have a multilabel 5 classes problem for a prediction. import numpy as np from sklearn.metrics import f1_score y_true = np.zeros((1,5)) y_true[0,0] = 1 # => label = [[1, 0, 0, 0, 0]] y_pred = np.zeros((1,5)) y_pred[:] = 1 # => prediction = [[1, 1, 1, 1, 1]] result_1 = f1_score(y_true=y_true, y_pred=y_pred, labels=None, average="weighted") print(result_1) # prints 1.0 result_2 = f1

How To Calculate F1-Score For Multilabel Classification?

寵の児 提交于 2021-02-07 03:15:34
问题 I try to calculate the f1_score but I get some warnings for some cases when I use the sklearn f1_score method. I have a multilabel 5 classes problem for a prediction. import numpy as np from sklearn.metrics import f1_score y_true = np.zeros((1,5)) y_true[0,0] = 1 # => label = [[1, 0, 0, 0, 0]] y_pred = np.zeros((1,5)) y_pred[:] = 1 # => prediction = [[1, 1, 1, 1, 1]] result_1 = f1_score(y_true=y_true, y_pred=y_pred, labels=None, average="weighted") print(result_1) # prints 1.0 result_2 = f1

Calculate weighted pairwise distance matrix in Python

白昼怎懂夜的黑 提交于 2021-02-06 20:01:48
问题 I am trying to find the fastest way to perform the following pairwise distance calculation in Python. I want to use the distances to rank a list_of_objects by their similarity. Each item in the list_of_objects is characterised by four measurements a, b, c, d, which are made on very different scales e.g.: object_1 = [0.2, 4.5, 198, 0.003] object_2 = [0.3, 2.0, 999, 0.001] object_3 = [0.1, 9.2, 321, 0.023] list_of_objects = [object_1, object_2, object_3] The aim is to get a pairwise distance