scikit-learn

How to save a custom transformer in sklearn?

柔情痞子 提交于 2020-12-02 05:55:46
问题 I am not able to load an instance of a custom transformer saved using either sklearn.externals.joblib.dump or pickle.dump because the original definition of the custom transformer is missing from the current python session. Suppose in one python session, I define, create and save a custom transformer, it can also be loaded in the same session: from sklearn.base import TransformerMixin from sklearn.base import BaseEstimator from sklearn.externals import joblib class CustomTransformer

Python scikit learn Linear Model Parameter Standard Error

跟風遠走 提交于 2020-12-02 05:37:49
问题 I am working with sklearn and specifically the linear_model module. After fitting a simple linear as in import pandas as pd import numpy as np from sklearn import linear_model randn = np.random.randn X = pd.DataFrame(randn(10,3), columns=['X1','X2','X3']) y = pd.DataFrame(randn(10,1), columns=['Y']) model = linear_model.LinearRegression() model.fit(X=X, y=y) I see how I can access to coefficients and intercept via coef_ and intercept_, prediction is straightforward as well. I would like to

Getting the maximum accuracy for a binary probabilistic classifier in scikit-learn

≡放荡痞女 提交于 2020-12-01 11:17:05
问题 Is there any built-in function to get the maximum accuracy for a binary probabilistic classifier in scikit-learn? E.g. to get the maximum F1-score I do: # AUCPR precision, recall, thresholds = sklearn.metrics.precision_recall_curve(y_true, y_score) auprc = sklearn.metrics.auc(recall, precision) max_f1 = 0 for r, p, t in zip(recall, precision, thresholds): if p + r == 0: continue if (2*p*r)/(p + r) > max_f1: max_f1 = (2*p*r)/(p + r) max_f1_threshold = t I could compute the maximum accuracy in

Convert decision tree directly to png [duplicate]

夙愿已清 提交于 2020-12-01 10:50:35
问题 This question already has answers here : graph.write_pdf(“iris.pdf”) AttributeError: 'list' object has no attribute 'write_pdf' (10 answers) Closed 4 years ago . I am trying to generate a decision tree which I want to visualize using dot. The resulting dotfile shall be converted to png. While I can do the last conversion step in dos using something like export_graphviz(dectree, out_file="graph.dot") followed by a DOS command dot -Tps graph.dot -o outfile.ps doing all this directly in python

Sci-kit learn how to print labels for confusion matrix?

99封情书 提交于 2020-12-01 09:21:37
问题 So I'm using sci-kit learn to classify some data. I have 13 different class values/categorizes to classify the data to. Now I have been able to use cross validation and print the confusion matrix. However, it only shows the TP and FP etc without the classlabels, so I don't know which class is what. Below is my code and my output: def classify_data(df, feature_cols, file): nbr_folds = 5 RANDOM_STATE = 0 attributes = df.loc[:, feature_cols] # Also known as x class_label = df['task'] # Class

Adaboost in Pipeline with Gridsearch SKLEARN

倾然丶 夕夏残阳落幕 提交于 2020-11-29 21:10:47
问题 I would like to use the AdaBoostClassifier with LinearSVC as base estimator. I want to do a gridsearch on some of the parameters in LinearSVC. Also I have to scale my features. p_grid = {'base_estimator__C': np.logspace(-5, 3, 10)} n_splits = 5 inner_cv = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=5) SVC_Kernel=LinearSVC(multi_class ='crammer_singer',tol=10e-3,max_iter=10000,class_weight='balanced') ABC = AdaBoostClassifier(base_estimator=SVC_Kernel,n_estimators=600

Adaboost in Pipeline with Gridsearch SKLEARN

天涯浪子 提交于 2020-11-29 21:07:04
问题 I would like to use the AdaBoostClassifier with LinearSVC as base estimator. I want to do a gridsearch on some of the parameters in LinearSVC. Also I have to scale my features. p_grid = {'base_estimator__C': np.logspace(-5, 3, 10)} n_splits = 5 inner_cv = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=5) SVC_Kernel=LinearSVC(multi_class ='crammer_singer',tol=10e-3,max_iter=10000,class_weight='balanced') ABC = AdaBoostClassifier(base_estimator=SVC_Kernel,n_estimators=600

How to create a subclass with class attributes based on constructor function arguments for use in an estimator for GridSearchCV?

非 Y 不嫁゛ 提交于 2020-11-29 09:23:28
问题 I want to subclass sklearn.svm.LinearSVC and use it as an estimator for sklearn.model_selection.GridSearchCV . I had some issues with subclassing earlier and I thought I fixed it based on my previous post and the selected answer. However, now my objective is to create an sklearn.kernel_approximation.RBFSampler object as an attribute of my new class. Now this is an example and I have a broader question here which is: Question: With the final expectation of using my new estimator class with

Reshape a data for Sklearn

走远了吗. 提交于 2020-11-29 03:51:12
问题 I have a list of colors: initialColors = [u'black' u'black' u'black' u'white' u'white' u'white' u'powderblue' u'whitesmoke' u'black' u'cornflowerblue' u'powderblue' u'powderblue' u'goldenrod' u'white' u'lavender' u'white' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'white' u'white' u'powderblue' u'white' u'white'] And I have a labels for these colors like this: labels_train = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

ValueError: continuous is not supported

余生颓废 提交于 2020-11-29 03:37:05
问题 I am using GridSearchCV for cross validation of a linear regression (not a classifier nor a logistic regression). I also use StandardScaler for normalization of X My dataframe has 17 features (X) and 5 targets (y) (observations). Around 1150 rows I keep getting ValueError: continuous is not supported error message and ran out of options. here is some code (assume all imports are done properly): soilM = pd.read_csv('C:/training.csv', index_col=0) soilM = getDummiedSoilDepth(soilM) #transform