scikit-learn | 易学教程

How to save a custom transformer in sklearn?

阅读更多关于 How to save a custom transformer in sklearn?

问题 I am not able to load an instance of a custom transformer saved using either sklearn.externals.joblib.dump or pickle.dump because the original definition of the custom transformer is missing from the current python session. Suppose in one python session, I define, create and save a custom transformer, it can also be loaded in the same session: from sklearn.base import TransformerMixin from sklearn.base import BaseEstimator from sklearn.externals import joblib class CustomTransformer

Python scikit learn Linear Model Parameter Standard Error

阅读更多关于 Python scikit learn Linear Model Parameter Standard Error

问题 I am working with sklearn and specifically the linear_model module. After fitting a simple linear as in import pandas as pd import numpy as np from sklearn import linear_model randn = np.random.randn X = pd.DataFrame(randn(10,3), columns=['X1','X2','X3']) y = pd.DataFrame(randn(10,1), columns=['Y']) model = linear_model.LinearRegression() model.fit(X=X, y=y) I see how I can access to coefficients and intercept via coef_ and intercept_, prediction is straightforward as well. I would like to

Getting the maximum accuracy for a binary probabilistic classifier in scikit-learn

阅读更多关于 Getting the maximum accuracy for a binary probabilistic classifier in scikit-learn

问题 Is there any built-in function to get the maximum accuracy for a binary probabilistic classifier in scikit-learn? E.g. to get the maximum F1-score I do: # AUCPR precision, recall, thresholds = sklearn.metrics.precision_recall_curve(y_true, y_score) auprc = sklearn.metrics.auc(recall, precision) max_f1 = 0 for r, p, t in zip(recall, precision, thresholds): if p + r == 0: continue if (2*p*r)/(p + r) > max_f1: max_f1 = (2*p*r)/(p + r) max_f1_threshold = t I could compute the maximum accuracy in

Convert decision tree directly to png [duplicate]

阅读更多关于 Convert decision tree directly to png [duplicate]

问题 This question already has answers here : graph.write_pdf(“iris.pdf”) AttributeError: 'list' object has no attribute 'write_pdf' (10 answers) Closed 4 years ago . I am trying to generate a decision tree which I want to visualize using dot. The resulting dotfile shall be converted to png. While I can do the last conversion step in dos using something like export_graphviz(dectree, out_file="graph.dot") followed by a DOS command dot -Tps graph.dot -o outfile.ps doing all this directly in python

Sci-kit learn how to print labels for confusion matrix?

阅读更多关于 Sci-kit learn how to print labels for confusion matrix?

问题 So I'm using sci-kit learn to classify some data. I have 13 different class values/categorizes to classify the data to. Now I have been able to use cross validation and print the confusion matrix. However, it only shows the TP and FP etc without the classlabels, so I don't know which class is what. Below is my code and my output: def classify_data(df, feature_cols, file): nbr_folds = 5 RANDOM_STATE = 0 attributes = df.loc[:, feature_cols] # Also known as x class_label = df['task'] # Class

Adaboost in Pipeline with Gridsearch SKLEARN

阅读更多关于 Adaboost in Pipeline with Gridsearch SKLEARN

问题 I would like to use the AdaBoostClassifier with LinearSVC as base estimator. I want to do a gridsearch on some of the parameters in LinearSVC. Also I have to scale my features. p_grid = {'base_estimator__C': np.logspace(-5, 3, 10)} n_splits = 5 inner_cv = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=5) SVC_Kernel=LinearSVC(multi_class ='crammer_singer',tol=10e-3,max_iter=10000,class_weight='balanced') ABC = AdaBoostClassifier(base_estimator=SVC_Kernel,n_estimators=600

Adaboost in Pipeline with Gridsearch SKLEARN

阅读更多关于 Adaboost in Pipeline with Gridsearch SKLEARN

How to create a subclass with class attributes based on constructor function arguments for use in an estimator for GridSearchCV?

阅读更多关于 How to create a subclass with class attributes based on constructor function arguments for use in an estimator for GridSearchCV?

问题 I want to subclass sklearn.svm.LinearSVC and use it as an estimator for sklearn.model_selection.GridSearchCV . I had some issues with subclassing earlier and I thought I fixed it based on my previous post and the selected answer. However, now my objective is to create an sklearn.kernel_approximation.RBFSampler object as an attribute of my new class. Now this is an example and I have a broader question here which is: Question: With the final expectation of using my new estimator class with

Reshape a data for Sklearn

阅读更多关于 Reshape a data for Sklearn

问题 I have a list of colors: initialColors = [u'black' u'black' u'black' u'white' u'white' u'white' u'powderblue' u'whitesmoke' u'black' u'cornflowerblue' u'powderblue' u'powderblue' u'goldenrod' u'white' u'lavender' u'white' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'white' u'white' u'powderblue' u'white' u'white'] And I have a labels for these colors like this: labels_train = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

ValueError: continuous is not supported

阅读更多关于 ValueError: continuous is not supported

问题 I am using GridSearchCV for cross validation of a linear regression (not a classifier nor a logistic regression). I also use StandardScaler for normalization of X My dataframe has 17 features (X) and 5 targets (y) (observations). Around 1150 rows I keep getting ValueError: continuous is not supported error message and ran out of options. here is some code (assume all imports are done properly): soilM = pd.read_csv('C:/training.csv', index_col=0) soilM = getDummiedSoilDepth(soilM) #transform