logistic-regression | 易学教程

classification: PCA and logistic regression using sklearn

阅读更多关于 classification: PCA and logistic regression using sklearn

Step 0: Problem description I have a classification problem, ie I want to predict a binary target based on a collection of numerical features, using logistic regression, and after running a Principal Components Analysis (PCA). I have 2 datasets: df_train and df_valid (training set and validation set respectively) as pandas data frame, containing the features and the target. As a first step, I have used get_dummies pandas function to transform all the categorical variables as boolean. For example, I would have: n_train = 10 np.random.seed(0) df_train = pd.DataFrame({"f1":np.random.random(n

Scikit F-score metric error

阅读更多关于 Scikit F-score metric error

I am trying to predict a set of labels using Logistic Regression from SciKit. My data is really imbalanced (there are many more '0' than '1' labels) so I have to use the F1 score metric during the cross-validation step to "balance" the result. [Input] X_training, y_training, X_test, y_test = generate_datasets(df_X, df_y, 0.6) logistic = LogisticRegressionCV( Cs=50, cv=4, penalty='l2', fit_intercept=True, scoring='f1' ) logistic.fit(X_training, y_training) print('Predicted: %s' % str(logistic.predict(X_test))) print('F1-score: %f'% f1_score(y_test, logistic.predict(X_test))) print('Accuracy

Python : How to use Multinomial Logistic Regression using SKlearn

阅读更多关于 Python : How to use Multinomial Logistic Regression using SKlearn

I have a test dataset and train dataset as below. I have provided a sample data with min records, but my data has than 1000's of records. Here E is my target variable which I need to predict using an algorithm. It has only four categories like 1,2,3,4. It can take only any of these values. Training Dataset: A B C D E 1 20 30 1 1 2 22 12 33 2 3 45 65 77 3 12 43 55 65 4 11 25 30 1 1 22 23 19 31 2 31 41 11 70 3 1 48 23 60 4 Test Dataset: A B C D E 11 21 12 11 1 2 3 4 5 6 7 8 99 87 65 34 11 21 24 12 Since E has only 4 categories, I thought of predicting this using Multinomial Logistic Regression

Load and predict new data sklearn

阅读更多关于 Load and predict new data sklearn

问题 I trained a Logistic model, cross-validated and saved it to file using joblib module. Now I want to load this model and predict new data with it. Is this the correct way to do this? Especially the standardization. Should I use scaler.fit() on my new data too? In the tutorials I followed, scaler.fit was only used on the training set, so I'm a bit lost here. Here is my code: #Loading the saved model with joblib model = joblib.load('model.pkl') # New data to predict pr = pd.read_csv('set_to

which coefficients go to which class in multiclass logistic regression in scikit learn?

阅读更多关于 which coefficients go to which class in multiclass logistic regression in scikit learn?

问题 I'm using scikit learn's Logistic Regression for a multiclass problem. logit = LogisticRegression(penalty='l1') logit = logit.fit(X, y) I'm interested in which features are driving this decision. logit.coef_ The above gives me a beautiful dataframe in (n_classes, n_features) format, but all the classes and feature names are gone. With features, that's okay, because making the assumption that they're indexed the same way as I passed them in seems safe... But with classes, it's a problem, since

scikit learn: how to check coefficients significance

阅读更多关于 scikit learn: how to check coefficients significance

问题 i tried to do a LR with SKLearn for a rather large dataset with ~600 dummy and only few interval variables (and 300 K lines in my dataset) and the resulting confusion matrix looks suspicious. I wanted to check the significance of the returned coefficients and ANOVA but I cannot find how to access it. Is it possible at all? And what is the best strategy for data that contains lots of dummy variables? Thanks a lot! 回答1: Scikit-learn deliberately does not support statistical inference. If you

Is it reasonable for l1/l2 regularization to cause all feature weights to be zero in vowpal wabbit?

阅读更多关于 Is it reasonable for l1/l2 regularization to cause all feature weights to be zero in vowpal wabbit?

I got a weird result from vw , which uses online learning scheme for logistic regression. And when I add --l1 or --l2 regularization then I got all predictions at 0.5 (that means all features are 0) Here's my command: vw -d training_data.txt --loss_function logistic -f model_l1 --invert_hash model_readable_l1 --l1 0.05 --link logistic ...and here's learning process info: using l1 regularization = 0.05 final_regressor = model_l1 Num weight bits = 18 learning rate = 0.5 initial_t = 0 power_t = 0.5 using no cache Reading datafile = training_data.txt num sources = 1 average since example example

plot multiple ROC curves for logistic regression model in R

阅读更多关于 plot multiple ROC curves for logistic regression model in R

I have a logistic regression model (using R) as fit6 <- glm(formula = survived ~ ascore + gini + failed, data=records, family = binomial) summary(fit6) I'm using pROC package to draw ROC curves and figure out AUC for 6 models fit1 through fit6. I have approached this way to plots one ROC. prob6=predict(fit6,type=c("response")) records$prob6 = prob6 g6 <- roc(survived~prob6, data=records) plot(g6) But is there a way I can combine the ROCs for all 6 curves in one plot and display the AUCs for all of them, and if possible the Confidence Intervals too. You can use the add = TRUE argument the plot

categorical variable in logistic regression in r

阅读更多关于 categorical variable in logistic regression in r

问题 how I have to implement a categorical variable in a binary logistic regression in R? I want to test the influence of the professional fields (student, worker, teacher, self-employed) on the probability of a purchase of a product. In my example y is a binary variable (1 for buying a product, 0 for not buying). - x1: is the gender (0 male, 1 female) - x2: is the age (between 20 and 80) - x3: is the categorical variable (1=student, 2=worker, 3=teacher, 4=self-employed) set.seed(123) y<-round

Why is logistic regression called regression? [closed]

阅读更多关于 Why is logistic regression called regression? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 months ago . According to what I have understood, linear regression predicts the outcome which can have continuous values, whereas logistic regression predicts outcome which is discrete. It seems to me that logistic regression is similar to a classification problem. So, why is it called regression ? There is also a related