scikit-learn

SKLearn Kernel PCA “Precomputed” argument

泄露秘密 提交于 2021-02-18 12:49:05
问题 I am trying to perform Kernel PCA using scikit-learn, using a kernel that is not in their implementation (and a custom input format that is recognized by this kernel). It would probably be easiest if I could just compute the kernel ahead of time, save it, and then use it in Kernel PCA. The precomputed argument to KernelPCA would imply that I am able to do what I want; however, it's not explained in the documentation, and I can't find any examples of it being used. Even in the unit test source

Using multiple custom classes with Pipeline sklearn (Python)

元气小坏坏 提交于 2021-02-18 11:43:27
问题 I try to do a tutorial on Pipeline for students but I block. I'm not an expert but I'm trying to improve. So thank you for your indulgence. In fact, I try in a pipeline to execute several steps in preparing a dataframe for a classifier: Step 1: Description of the dataframe Step 2: Fill NaN Values Step 3: Transforming Categorical Values into Numbers Here is my code: class Descr_df(object): def transform (self, X): print ("Structure of the data: \n {}".format(X.head(5))) print ("Features names:

Using multiple custom classes with Pipeline sklearn (Python)

落爺英雄遲暮 提交于 2021-02-18 11:43:10
问题 I try to do a tutorial on Pipeline for students but I block. I'm not an expert but I'm trying to improve. So thank you for your indulgence. In fact, I try in a pipeline to execute several steps in preparing a dataframe for a classifier: Step 1: Description of the dataframe Step 2: Fill NaN Values Step 3: Transforming Categorical Values into Numbers Here is my code: class Descr_df(object): def transform (self, X): print ("Structure of the data: \n {}".format(X.head(5))) print ("Features names:

Plot k-Nearest-Neighbor graph with 8 features?

回眸只為那壹抹淺笑 提交于 2021-02-18 10:28:11
问题 I'm new to machine learning and would like to setup a little sample using the k-nearest-Neighbor-method with the Python library Scikit . Transforming and fitting the data works fine but I can't figure out how to plot a graph showing the datapoints surrounded by their "neighborhood". The dataset I'm using looks like that: So there are 8 features, plus one "outcome" column. From my understanding, I get an array, showing the euclidean-distances of all datapoints, using the kneighbors_graph from

sklearn multiclass svm function

久未见 提交于 2021-02-18 08:30:48
问题 I have multi class labels and want to compute the accuracy of my model. I am kind of confused on which sklearn function I need to use. As far as I understood the below code is only used for the binary classification. # dividing X, y into train and test data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state = 0) # training a linear SVM classifier from sklearn.svm import SVC svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train) svm

Using trained Scikit-learn svm classifiers in Android [closed]

与世无争的帅哥 提交于 2021-02-18 08:05:54
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 years ago . Improve this question I am developing an Android app that uses sensor data from the phone to classify activities. I also really prefer scikit-learn to any of the Java machine learning libraries. So I created a very minimal REST api using Django and scikit learn to train sensor

scikits confusion matrix with cross validation

假装没事ソ 提交于 2021-02-18 06:42:42
问题 I am training a svm classifier with cross validation (stratifiedKfold) using the scikits interfaces. For each test set (of k), I get a classification result. I want to have a confusion matrix with all the results. Scikits has a confusion matrix interface: sklearn.metrics.confusion_matrix(y_true, y_pred) My question is how should I accumulate the y_true and y_pred values. They are arrays (numpy). Should I define the size of the arrays based on my k-fold parameter? And for each result I should

scikits confusion matrix with cross validation

戏子无情 提交于 2021-02-18 06:41:25
问题 I am training a svm classifier with cross validation (stratifiedKfold) using the scikits interfaces. For each test set (of k), I get a classification result. I want to have a confusion matrix with all the results. Scikits has a confusion matrix interface: sklearn.metrics.confusion_matrix(y_true, y_pred) My question is how should I accumulate the y_true and y_pred values. They are arrays (numpy). Should I define the size of the arrays based on my k-fold parameter? And for each result I should

how to print estimated coefficients after a (GridSearchCV) fit a model? (SGDRegressor)

一笑奈何 提交于 2021-02-17 19:32:32
问题 I am new to scikit-learn , but it did what I was hoping for. Now, maddeningly, the only remaining issue is that I don't find how I could print (or even better, write to a small text file) all the coefficients it estimated, all the features it selected. What is the way to do this? Same with SGDClassifier, but I think it is the same for all base objects that can be fit, with cross validation or without. Full script below. import scipy as sp import numpy as np import pandas as pd import

Grid Search with Recursive Feature Elimination in scikit-learn pipeline returns an error

耗尽温柔 提交于 2021-02-17 19:09:46
问题 I am trying to chain Grid Search and Recursive Feature Elimination in a Pipeline using scikit-learn. GridSearchCV and RFE with "bare" classifier works fine: from sklearn.datasets import make_friedman1 from sklearn import feature_selection from sklearn.grid_search import GridSearchCV from sklearn.svm import SVR X, y = make_friedman1(n_samples=50, n_features=10, random_state=0) est = SVR(kernel="linear") selector = feature_selection.RFE(est) param_grid = dict(estimator__C=[0.1, 1, 10]) clf =