logistic-regression

Interaction effects in patsy with patsy.dmatrices giving duplicate columns for “:” as with “+” , or “*”

时光怂恿深爱的人放手 提交于 2019-11-30 18:25:02
问题 I have a dataframe with columns, both of which I intend to treat as categorical variables. the first column is country , which has values such as SGP, AUS, MYS etc. The second column is time of day, which has values in 24 hour format such as 00, 11, 14, 15 etc. event is a binary variable that has 1/0 flags. I understand that to categorize them , I need to use patsy before running the Logistic regression. This, I build using dmatrices. Usecase : Consider only interaction effects of country &

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1.0

心已入冬 提交于 2019-11-30 14:11:35
I have a training dataset of 8670 trials and each trial has a length of 125-time samples while my test set consists of 578 trials. When I apply SVM algorithm from scikit-learn, I get pretty good results. However, when I apply logistic regression, this error occurs: "ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1.0" . My question is why SVM is able to give predictions but logistic regression gives this error? Could it be possible that something is wrong in the dataset or just that logistic regression was not able to classify

Evaluating Logistic regression with cross validation

China☆狼群 提交于 2019-11-30 12:20:01
问题 I would like to use cross validation to test/train my dataset and evaluate the performance of the logistic regression model on the entire dataset and not only on the test set (e.g. 25%). These concepts are totally new to me and am not very sure if am doing it right. I would be grateful if anyone could advise me on the right steps to take where I have gone wrong. Part of my code is shown below. Also, how can I plot ROCs for "y2" and "y3" on the same graph with the current one? Thank you import

Spark Java Error: Size exceeds Integer.MAX_VALUE

☆樱花仙子☆ 提交于 2019-11-30 06:40:59
I am trying to use spark for some simple machine learning task. I used pyspark and spark 1.2.0 to do a simple logistic regression problem. I have 1.2 million records for training, and I hashed the features of the records. When I set the number of hashed features as 1024, the program works fine, but when I set the number of hashed features as 16384, the program fails several times with the following error: Py4JJavaError: An error occurred while calling o84.trainLogisticRegressionModelWithSGD. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 4.0 failed 4 times

Is my implementation of stochastic gradient descent correct?

笑着哭i 提交于 2019-11-30 04:13:19
I am trying to develop stochastic gradient descent, but I don't know if it is 100% correct. The cost generated by my stochastic gradient descent algorithm is sometimes very far from the one generated by FMINUC or Batch gradient descent. while batch gradient descent cost converge when I set a learning rate alpha of 0.2, I am forced to set a learning rate alpha of 0.0001 for my stochastic implementation for it not to diverge. Is this normal? Here are some results I obtained with a training set of 10,000 elements and num_iter = 100 or 500 FMINUC : Iteration #100 | Cost: 5.147056e-001 BACTH

Logistic regression - defining reference level in R

让人想犯罪 __ 提交于 2019-11-30 04:09:47
I am going nuts trying to figure this out. How can I in R, define the reference level to use in a binary logistic regression? What about the multinomial logistic regression? Right now my code is: logistic.train.model3 <- glm(class~ x+y+z, family=binomial(link=logit), data=auth, na.action = na.exclude) my response variable is "YES" and "NO". I want to predict the probability of someone responding with "YES". I DO NOT want to recode the variable to 0 / 1. Is there a way I can tell the model to predict "YES" ? Thank you for your help. smrt1119 Assuming you have class saved as a factor, use the

Regression (logistic) in R: Finding x value (predictor) for a particular y value (outcome)

僤鯓⒐⒋嵵緔 提交于 2019-11-30 03:32:37
问题 I've fitted a logistic regression model that predicts the a binary outcome vs from mpg ( mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg value is when the probability of vs is 0.50. Appreciate any help anyone can provide! model <- glm(vs ~ mpg, data = mtcars, family = binomial) ggplot(mtcars, aes(mpg, vs)) + geom_point() + stat_smooth(method = "glm", method.args = list(family =

Evaluating Logistic regression with cross validation

好久不见. 提交于 2019-11-30 02:26:40
I would like to use cross validation to test/train my dataset and evaluate the performance of the logistic regression model on the entire dataset and not only on the test set (e.g. 25%). These concepts are totally new to me and am not very sure if am doing it right. I would be grateful if anyone could advise me on the right steps to take where I have gone wrong. Part of my code is shown below. Also, how can I plot ROCs for "y2" and "y3" on the same graph with the current one? Thank you import pandas as pd Data=pd.read_csv ('C:\\Dataset.csv',index_col='SNo') feature_cols=['A','B','C','D','E'] X

Issues with Logistic Regression for multiclass classification using PySpark

為{幸葍}努か 提交于 2019-11-29 22:29:42
问题 I am trying to use Logistic Regression to classify the datasets which has Sparse Vector in feature vector: For full code base and error log, please check my github repo Case 1 : I tried using the pipeline of ML as follow: # imported library from ML from pyspark.ml.feature import HashingTF from pyspark.ml import Pipeline from pyspark.ml.classification import LogisticRegression print(type(trainingData)) # for checking only print(trainingData.take(2)) # for of data type lr = LogisticRegression

Scikit Learn: Logistic Regression model coefficients: Clarification

点点圈 提交于 2019-11-29 21:47:26
I need to know how to return the logistic regression coefficients in such a manner that I can generate the predicted probabilities myself. My code looks like this: lr = LogisticRegression() lr.fit(training_data, binary_labels) # Generate probabities automatically predicted_probs = lr.predict_proba(binary_labels) I had assumed the lr.coeff_ values would follow typical logistic regression, so that I could return the predicted probabilities like this: sigmoid( dot([val1, val2, offset], lr.coef_.T) ) But this is not the appropriate formulation. Does anyone have the proper format for generating