logistic-regression

logistic regression python solvers' defintions

这一生的挚爱 提交于 2019-11-28 15:18:15
问题 I am using the logistic regression function from sklearn, and was wondering what each of the solver is actually doing behind the scenes to solve the optimization problem. Can someone briefly describe what "newton-cg", "sag", "lbfgs" and "liblinear" are doing? If not, any related links or reading materials are much appreciated too. Thanks a lot in advance. 回答1: Well, I hope I'm not too late to the party! Let me first try to establish some intuition before digging in loads of information (

How to get the probability per instance in classifications models in spark.mllib

你说的曾经没有我的故事 提交于 2019-11-28 14:30:57
I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per instance. In Weka, we can get the exact probability for each instance to be of each class. How can we do it using these packages? In LogisticRegressionModel we can set the threshold. So I've created a function that check the results for each point on a different threshold. But this cannot be done for RandomForest (see How to set cutoff while training

Dictvectorizer for list as one feature in Python Pandas and Scikit-learn

谁说我不能喝 提交于 2019-11-28 09:51:31
问题 I have been trying to solve this for days, and although I have found a similar problem here How can i vectorize list using sklearn DictVectorizer, the solution is overly simplified. I would like to fit some features into a logistic regression model to predict 'chinese' or 'non-chinese'. I have a raw_name which I will extract to get two features 1) is just the last name, and 2) is a list of substring of the last name, for example, 'Chan' will give ['ch', 'ha', 'an']. But it seems

plotting decision boundary of logistic regression

前提是你 提交于 2019-11-28 06:00:34
I'm implementing logistic regression. I managed to get probabilities out of it, and am able to predict a 2 class classification task. My question is: For my final model, I have weights and the training data. There are 2 features, so my weight is a vector with 2 rows. How do I plot this? I saw this post , but I don't quite understand the answer. Do I need a contour plot? An advantage of the logistic regression classifier is that once you fit it, you can get probabilities for any sample vector. That may be more interesting to plot. Here's an example using scikit-learn: import numpy as np from

How to perform logistic regression using vowpal wabbit on very imbalanced dataset

倖福魔咒の 提交于 2019-11-28 03:09:40
I am trying to use vowpal wabbit for logistic regression. I am not sure if this is the right syntax to do it For training, I do ./vw -d ~/Desktop/new_data.txt --passes 20 --binary --cache_file cache.txt -f lr.vw --loss_function logistic --l1 0.05 For testing I do ./vw -d ~/libsvm-3.18_test/matlab/new_data_test.txt --binary -t -i lr.vw -p predictions.txt -r raw_score.txt Here is a snippet from my train data -1:1.00038 | 110:0.30103 262:0.90309 689:1.20412 1103:0.477121 1286:1.5563 2663:0.30103 2667:0.30103 2715:4.63112 3012:0.30103 3113:8.38411 3119:4.62325 3382:1.07918 3666:1.20412 3728:5

Using glmer for logistic regression, how to verify response reference

拜拜、爱过 提交于 2019-11-28 02:40:28
问题 My question is quite simple, but I've been unable to find a clear answer in either R manuals or online searching. Is there a good way to verify what your reference is for the response variable when doing a logistic regression with glmer? I am getting results that consistently run the exact opposite of theory and I think my response variable must be reversed from my intention, but I have been unable to verify. My response variable is coded in 0's and 1's. Thanks! 回答1: You could simulate some

Changing accuracy value and no change in loss value in binary classification using Tensorflow

情到浓时终转凉″ 提交于 2019-11-27 22:49:48
问题 am trying to use a deep neural network architecture to classify against a binary label value - 0 and +1. Here is my code to do it in tensorflow. Also this question carries forward from the discussion in a previous question import tensorflow as tf import numpy as np from preprocess import create_feature_sets_and_labels train_x,train_y,test_x,test_y = create_feature_sets_and_labels() x = tf.placeholder('float', [None, 5]) y = tf.placeholder('float') n_nodes_hl1 = 500 n_nodes_hl2 = 500 # n_nodes

Different Robust Standard Errors of Logit Regression in Stata and R

元气小坏坏 提交于 2019-11-27 19:30:38
I am trying to replicate a logit regression from Stata to R. In Stata I use the option "robust" to have the robust standard error (heteroscedasticity-consistent standard error). I am able to replicate the exactly same coefficients from Stata, but I am not able to have the same robust standard error with the package "sandwich". I have tried some OLS linear regression examples; it seems like the sandwich estimators of R and Stata give me the same robust standard error for OLS. Does anybody know how Stata calculate the sandwich estimator for non-linear regression, in my case the logit regression?

Roc curve and cut off point. Python

不打扰是莪最后的温柔 提交于 2019-11-27 17:20:54
I ran a logistic regression model and made predictions of the logit values. I used this to get the points on the ROC curve: from sklearn import metrics fpr, tpr, thresholds = metrics.roc_curve(Y_test,p) I know metrics.roc_auc_score gives the area under the ROC curve. Can anyone tell me what command will find the optimal cut-off point (threshold value)? Manohar Swamynathan Though its late to answer, thought might be helpful. You can do this using the epi package in R (here!) , however I could not find similar package or example in python. The optimal cut off point would be where true positive

plotting decision boundary of logistic regression

拜拜、爱过 提交于 2019-11-27 11:43:06
问题 I'm implementing logistic regression. I managed to get probabilities out of it, and am able to predict a 2 class classification task. My question is: For my final model, I have weights and the training data. There are 2 features, so my weight is a vector with 2 rows. How do I plot this? I saw this post, but I don't quite understand the answer. Do I need a contour plot? 回答1: An advantage of the logistic regression classifier is that once you fit it, you can get probabilities for any sample