logistic-regression

classification: PCA and logistic regression using sklearn

六眼飞鱼酱① 提交于 2019-12-05 05:39:01
Step 0: Problem description I have a classification problem, ie I want to predict a binary target based on a collection of numerical features, using logistic regression, and after running a Principal Components Analysis (PCA). I have 2 datasets: df_train and df_valid (training set and validation set respectively) as pandas data frame, containing the features and the target. As a first step, I have used get_dummies pandas function to transform all the categorical variables as boolean. For example, I would have: n_train = 10 np.random.seed(0) df_train = pd.DataFrame({"f1":np.random.random(n

Scikit F-score metric error

故事扮演 提交于 2019-12-05 05:32:40
I am trying to predict a set of labels using Logistic Regression from SciKit. My data is really imbalanced (there are many more '0' than '1' labels) so I have to use the F1 score metric during the cross-validation step to "balance" the result. [Input] X_training, y_training, X_test, y_test = generate_datasets(df_X, df_y, 0.6) logistic = LogisticRegressionCV( Cs=50, cv=4, penalty='l2', fit_intercept=True, scoring='f1' ) logistic.fit(X_training, y_training) print('Predicted: %s' % str(logistic.predict(X_test))) print('F1-score: %f'% f1_score(y_test, logistic.predict(X_test))) print('Accuracy

Python : How to use Multinomial Logistic Regression using SKlearn

余生颓废 提交于 2019-12-05 04:50:01
I have a test dataset and train dataset as below. I have provided a sample data with min records, but my data has than 1000's of records. Here E is my target variable which I need to predict using an algorithm. It has only four categories like 1,2,3,4. It can take only any of these values. Training Dataset: A B C D E 1 20 30 1 1 2 22 12 33 2 3 45 65 77 3 12 43 55 65 4 11 25 30 1 1 22 23 19 31 2 31 41 11 70 3 1 48 23 60 4 Test Dataset: A B C D E 11 21 12 11 1 2 3 4 5 6 7 8 99 87 65 34 11 21 24 12 Since E has only 4 categories, I thought of predicting this using Multinomial Logistic Regression

Load and predict new data sklearn

风格不统一 提交于 2019-12-05 03:56:17
问题 I trained a Logistic model, cross-validated and saved it to file using joblib module. Now I want to load this model and predict new data with it. Is this the correct way to do this? Especially the standardization. Should I use scaler.fit() on my new data too? In the tutorials I followed, scaler.fit was only used on the training set, so I'm a bit lost here. Here is my code: #Loading the saved model with joblib model = joblib.load('model.pkl') # New data to predict pr = pd.read_csv('set_to

which coefficients go to which class in multiclass logistic regression in scikit learn?

走远了吗. 提交于 2019-12-04 22:14:20
问题 I'm using scikit learn's Logistic Regression for a multiclass problem. logit = LogisticRegression(penalty='l1') logit = logit.fit(X, y) I'm interested in which features are driving this decision. logit.coef_ The above gives me a beautiful dataframe in (n_classes, n_features) format, but all the classes and feature names are gone. With features, that's okay, because making the assumption that they're indexed the same way as I passed them in seems safe... But with classes, it's a problem, since

scikit learn: how to check coefficients significance

我的未来我决定 提交于 2019-12-04 19:27:15
问题 i tried to do a LR with SKLearn for a rather large dataset with ~600 dummy and only few interval variables (and 300 K lines in my dataset) and the resulting confusion matrix looks suspicious. I wanted to check the significance of the returned coefficients and ANOVA but I cannot find how to access it. Is it possible at all? And what is the best strategy for data that contains lots of dummy variables? Thanks a lot! 回答1: Scikit-learn deliberately does not support statistical inference. If you

Is it reasonable for l1/l2 regularization to cause all feature weights to be zero in vowpal wabbit?

江枫思渺然 提交于 2019-12-04 17:17:38
I got a weird result from vw , which uses online learning scheme for logistic regression. And when I add --l1 or --l2 regularization then I got all predictions at 0.5 (that means all features are 0) Here's my command: vw -d training_data.txt --loss_function logistic -f model_l1 --invert_hash model_readable_l1 --l1 0.05 --link logistic ...and here's learning process info: using l1 regularization = 0.05 final_regressor = model_l1 Num weight bits = 18 learning rate = 0.5 initial_t = 0 power_t = 0.5 using no cache Reading datafile = training_data.txt num sources = 1 average since example example

plot multiple ROC curves for logistic regression model in R

陌路散爱 提交于 2019-12-04 15:54:27
I have a logistic regression model (using R) as fit6 <- glm(formula = survived ~ ascore + gini + failed, data=records, family = binomial) summary(fit6) I'm using pROC package to draw ROC curves and figure out AUC for 6 models fit1 through fit6. I have approached this way to plots one ROC. prob6=predict(fit6,type=c("response")) records$prob6 = prob6 g6 <- roc(survived~prob6, data=records) plot(g6) But is there a way I can combine the ROCs for all 6 curves in one plot and display the AUCs for all of them, and if possible the Confidence Intervals too. You can use the add = TRUE argument the plot

categorical variable in logistic regression in r

ぐ巨炮叔叔 提交于 2019-12-04 14:37:24
问题 how I have to implement a categorical variable in a binary logistic regression in R? I want to test the influence of the professional fields (student, worker, teacher, self-employed) on the probability of a purchase of a product. In my example y is a binary variable (1 for buying a product, 0 for not buying). - x1: is the gender (0 male, 1 female) - x2: is the age (between 20 and 80) - x3: is the categorical variable (1=student, 2=worker, 3=teacher, 4=self-employed) set.seed(123) y<-round

Why is logistic regression called regression? [closed]

一笑奈何 提交于 2019-12-04 13:52:08
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 months ago . According to what I have understood, linear regression predicts the outcome which can have continuous values, whereas logistic regression predicts outcome which is discrete. It seems to me that logistic regression is similar to a classification problem. So, why is it called regression ? There is also a related