logistic-regression

why multinom() predicts a lot of rows of probabilities for each level of outcome?

柔情痞子 提交于 2019-12-11 18:20:59
问题 I have a moltinomial logistic regression and the outcome variable has 6 levels: 10,20,60,70,80,90 test<-multinom(y ~ x1 + x2 + as.factor(x3) ,data=data1) I want to predict the probabilities associate with each level of y for each set of given input values. So I run this: dfin <- data.frame( ses = c(10,20,60,70,80,90), x1=2.1, x2=4, x3=40) predict(test, todaydata = dfin, type = "probs") But instead of getting 6 probabilities (one for each level of outcome), I got many many rows of

How to deal with co-linearity of dummy variables for linear regression?

不问归期 提交于 2019-12-11 16:58:07
问题 I am using scikit-learn LogisticRegression on a dataset of household characteristics and trying to understand how to prepare the independent variables. I have created binary dummy variables in place of categorical variables. e.g. The variable DWELLING_TYPE which had 3 possible values DetachedHouse , SemiDetached and Apartment has been replaced with 3 binary variables DWELLING_TYPE_DetachedHouse , DWELLING_TYPE_SemiDetached and DWELLING_TYPE_Apartment that each has the value 1 or 0`. Clearly

Convert Class Probabilities of a multiclass model to scores in range 0-100

筅森魡賤 提交于 2019-12-11 15:54:30
问题 What I want to do is to generate a score of 0-100 based on the predictions of a three class classification model. For eg. The predict_proba of a 3 class logistic regression model gives me 3 probabilities x, y, z as shown below - 0 1 2 x y z Now, I want to generate a score of 0-100 based on these probabilities, where 0 is closer to class 0 and 100 is closer to class 2. 回答1: Try this: prob['P']=(prob['1']*1+prob['2']*2)/2 prob['0'] is multiplied by 0, so you don't need it. examples: prob['0']=0

Dummy variables for Logistic regression in R

霸气de小男生 提交于 2019-12-11 13:50:56
问题 I am running a logistic regression on three factors that are all binary. My data table1<-expand.grid(Crime=factor(c("Shoplifting","Other Theft Acts")),Gender=factor(c("Men","Women")), Priorconv=factor(c("N","P"))) table1<-data.frame(table1,Yes=c(24,52,48,22,17,60,15,4),No=c(1,9,3,2,6,34,6,3)) and the model fit4<-glm(cbind(Yes,No)~Priorconv+Crime+Priorconv:Crime,data=table1,family=binomial) summary(fit4) R seems to take 1 for prior conviction P and 1 for crime shoplifting. As a result the

Logistic regression with spark ml (data frames)

牧云@^-^@ 提交于 2019-12-11 13:07:33
问题 I wrote the following code for logistic regression, I want to use the pipeline API provided by spark.ml . However it gave me an error after I try to print coefficients and intercepts. Also I am having trouble computing the confusion matrix and other metrics like precision, recall. #Logistic Regression: from pyspark.mllib.linalg import Vectors from pyspark.ml.classification import LogisticRegression from pyspark.sql import SQLContext from pyspark import SparkContext from pyspark.sql.types

Making Random Forest outputs like Logistic Regression

拥有回忆 提交于 2019-12-11 10:13:26
问题 I am asking dimensional wise etc. I am trying to implement this amazing work with random forest https://www.kaggle.com/allunia/how-to-attack-a-machine-learning-model/notebook Both logistic regression and random forest are from sklearn but when I get weights from random forest model its (784,) while the logistic regression returns (10,784) My most problems are mainly dimension and NaN, infinity or a value too large for dtype errors with attack methods. The weights using logical regression is

scikit-learn - multinomial logistic regression with probabilities as a target variable

百般思念 提交于 2019-12-11 07:59:52
问题 I'm implementing a multinomial logistic regression model in Python using scikit-learn. The thing is, however, that I'd like to use probability distribution for classes of my target variable. As an example let's say that this is a 3-classes variable which looks as follows: class_1 class_2 class_3 0 0.0 0.0 1.0 1 1.0 0.0 0.0 2 0.0 0.5 0.5 3 0.2 0.3 0.5 4 0.5 0.1 0.4 So that a sum of values for every row equals to 1. How could I fit a model like this? When I try: model = LogisticRegression

Need help understanding the Caffe code for SigmoidCrossEntropyLossLayer for multi-label loss

偶尔善良 提交于 2019-12-11 07:54:46
问题 I need help in understanding the Caffe function, SigmoidCrossEntropyLossLayer , which is the cross-entropy error with logistic activation. Basically, the cross-entropy error for a single example with N independent targets is denoted as: - sum-over-N( t[i] * log(x[i]) + (1 - t[i]) * log(1 - x[i] ) where t is the target, 0 or 1, and x is the output, indexed by i . x , of course goes through a logistic activation. An algebraic trick for quicker cross-entropy calculation reduces the computation

sklearn Logistic Regression with n_jobs=-1 doesn't actually parallelize

[亡魂溺海] 提交于 2019-12-11 06:51:51
问题 I'm trying to train a huge dataset with sklearn's logistic regression. I've set the parameter n_jobs=-1 (also have tried n_jobs = 5, 10, ...), but when I open htop, I can see that it still uses only one core. Does it mean that logistic regression just ignores the n_jobs parameter? How can I fix this? I really need this process to become parallelized... P.S. I am using sklearn 0.17.1 回答1: the parallel process backend also depends on the solver method. if you want to utilize multi core, the

Is the a way of getting the degree of positiveness or negativeness when using Logistic Regression for sentiment analysis

你说的曾经没有我的故事 提交于 2019-12-11 06:17:22
问题 I have been following an example about Sentiment Analysis using Logistic Regression, in which prediction result only gives a 1 or 0 to give positive or negative sentiment respectively. My challenge is that i want to classify a given user input into one of the four classes (very good, good, average, poor) but my prediction result every time is 1 or 0. Below is my code sample so far from sklearn.feature_extraction.text import CountVectorizer from vaderSentiment.vaderSentiment import