scikit-learn | 易学教程

Custom transformer mixin with FeatureUnion in scikit-learn

阅读更多关于 Custom transformer mixin with FeatureUnion in scikit-learn

问题 I am writing custom transformers in scikit-learn in order to do specific operations on the array. For that I use inheritance of class TransformerMixin. It works fine when I deal only with one transformer. However when I try to chain them using FeatureUnion (or make_union), the array is replicated n-times. What could I do to avoid that? Am I using scikit-learn as it is supposed to be? import numpy as np from sklearn.base import TransformerMixin from sklearn.pipeline import FeatureUnion #

LinearSVC Feature Selection returns different coef_ in Python

阅读更多关于 LinearSVC Feature Selection returns different coef_ in Python

问题 I'm using SelectFromModel with a LinearSVC on a training data set. The training and testing set had been already split and are saved in separate files. When I fit the LinearSVC on the training set I get a set of coef_[0] which I try to find the most important features. When I rerun the script i get different coef_[0] values even though it is on the same training data. Why is this the case? See below for snip of code (maybe there's a bug I don't see): fig = plt.figure() #SelectFromModel lsvc =

Understanding score of 1 in scikit-learn Gaussian process regressor

阅读更多关于 Understanding score of 1 in scikit-learn Gaussian process regressor

问题 I'm new to Gaussian processes and struggling to validate the output of my scikit GPR. I'm particularly concerned with the fact that my GPR returns a score of 1, which doesn't make any sense to me because the coefficient of determination of this data should not be equal to 1. Is there a particular problem with the GRP or data that is implied by a score of 1? I've included my code, and my X,Y are each arrays of length 15. I have additionally tried both the Matern and RBF kernels on their own

Inverse Transform Predicted Results

阅读更多关于 Inverse Transform Predicted Results

问题 I have a training data CSV with three columns (two for data and a third for targets) and I successfully predicted the target column for my test CSV. The problem is I need to inverse transform the results back to strings for further analysis. Below is the code and error. from sklearn import datasets from sklearn import svm from sklearn.neighbors import KNeighborsClassifier from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import train_test_split from sklearn

problem with alpha and lambda regularization parameters in python

阅读更多关于 problem with alpha and lambda regularization parameters in python

问题 Question : Logistic Regression Train logistic regression models with L1 regularization and L2 regularization using alpha = 0.1 and lambda = 0.1. Report accuracy, precision, recall, f1-score and print the confusion matrix My code is : _lambda = 0.1 c = 1/_lambda classifier = LogisticRegression(penalty='l1',C=c) classifier.fit(X_train, y_train) y_pred = classifier.predict(X_test) I don't know where is really location of alpha and lambda. Did I work right? 回答1: your example alpha=0, lambda=10

ValueError: Penalty term must be positive

阅读更多关于 ValueError: Penalty term must be positive

问题 When I'm fit my model using logistic regression showing me a value error like ValueError: Penalty term must be positive. C=[1e-4, 1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 1e3, 1e4] for i in C[-9:]: logisticl2 = LogisticRegression(penalty='l2',C=C) logisticl2.fit(X_train,Y_train) probs = logisticl2.predict_proba(X_test) getting error: ValueError: Penalty term must be positive; got (C=[0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0, 10000.0]) 回答1: Looking more closely, you'll realize that you are

I try imputing in sklearn but I have an error

阅读更多关于 I try imputing in sklearn but I have an error

问题 I try below code but I have some error. imp=SimpleImputer(missing_values='NaN',strategy="mean") col = veriler.iloc[:,1:4].values type(col) ##numpy.ndarray imp=imp.fit(col) ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). 回答1: You need to convert the infinity values to a bounded value to apply imputation. np.nan_to_num clips nan , inf and -inf to workable values. For example: import numpy as np from sklearn.impute import SimpleImputer imp_mean = SimpleImputer

How can I make the FunctionTransformer along with GridSearchCV into a pipeline？

阅读更多关于 How can I make the FunctionTransformer along with GridSearchCV into a pipeline？

问题 Basically, I want to treat the column index as a hyperparameter. Then tune this hyperparameter along with other model hyperparameters in the pipeline. In my example below, the col_idx is my hyperparameter. I self-defined a function called log_columns that can perform log transformation on certain columns and the function can be passed into FunctionTransformer . Then put FunctionTransformer and model into the pipeline. from sklearn.svm import SVC from sklearn.decomposition import PCA from

Why roc_auc produces weird results in sklearn?

阅读更多关于 Why roc_auc produces weird results in sklearn?

问题 I have a binary classification problem where I use the following code to get my weighted avarege precision , weighted avarege recall , weighted avarege f-measure and roc_auc . df = pd.read_csv(input_path+input_file) X = df[features] y = df[["gold_standard"]] clf = RandomForestClassifier(random_state = 42, class_weight="balanced") k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0) scores = cross_validate(clf, X, y, cv=k_fold, scoring = ('accuracy', 'precision_weighted',

Sklearn logistic regression - adjust cutoff point

阅读更多关于 Sklearn logistic regression - adjust cutoff point

问题 I have a logistic regression model trying to predict one of two classes: A or B. My model's accuracy when predicting A is ~85%. Model's accuracy when predicting B is ~50%. Prediction of B is not important however prediction of A is very important. My goal is to maximize the accuracy when predicting A. Is there any way to adjust the default decision threshold when determining the class? classifier = LogisticRegression(penalty = 'l2',solver = 'saga', multi_class = 'ovr') classifier.fit(np