logistic-regression

Python scikit-learn to JSON

前提是你 提交于 2019-12-01 10:41:52
I have a model built with Python scikit-learn. I understand that the models can be saved in Pickle or Joblib formats. Are there any existing methods out there to save the jobs in JSON format? Please see the model build code below for reference: import pandas from sklearn import model_selection from sklearn.linear_model import LogisticRegression import pickle url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data" names =['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] dataframe = pandas.read_csv(url, names=names

feature selection using logistic regression

会有一股神秘感。 提交于 2019-12-01 09:48:51
问题 I am performing feature selection ( on a dataset with 1,930,388 rows and 88 features) using Logistic Regression. If I test the model on held-out data, the accuracy is just above 60%. The response variable is equally distributed. My question is, if the model's performance is not good, can I consider the features that it gives as actual important features? Or should I try to improve the accuracy of the model though my end-goal is not to improve the accuracy but only get important features 回答1:

Applying Cost Functions in R

China☆狼群 提交于 2019-12-01 09:43:21
I am in the beginning stages of machine learning in R and I find it hard to believe that there are no packages to solving the cost function for different types of regression algorithms. For example, if I want to solve the cost function for a logistic regression, the manual way would be below: https://www.r-bloggers.com/logistic-regression-with-r-step-by-step-implementation-part-2/ # Implement Sigmoid function sigmoid <- function(z) { g <- 1/(1+exp(-z)) return(g) } #Cost Function cost <- function(theta) { m <- nrow(X) g <- sigmoid(X%*%theta) J <- (1/m)*sum((-Y*log(g)) - ((1-Y)*log(1-g))) return

Speeding up matrix-vector multiplication and exponentiation in Python, possibly by calling C/C++

十年热恋 提交于 2019-12-01 05:21:24
I am currently working on a machine learning project where - given a data matrix Z and a vector rho - I have to compute the value and slope of the logistic loss function at rho . The computation involves basic matrix-vector multiplication and log/exp operations, with a trick to avoid numerical overflow (described in this previous post ). I am currently doing this in Python using NumPy as shown below (as a reference, this code runs in 0.2s). Although this works well, I would like to speed it up since I call the function multiple times in my code (and it represents over 90% of the computation

Speeding up matrix-vector multiplication and exponentiation in Python, possibly by calling C/C++

老子叫甜甜 提交于 2019-12-01 02:52:44
问题 I am currently working on a machine learning project where - given a data matrix Z and a vector rho - I have to compute the value and slope of the logistic loss function at rho . The computation involves basic matrix-vector multiplication and log/exp operations, with a trick to avoid numerical overflow (described in this previous post). I am currently doing this in Python using NumPy as shown below (as a reference, this code runs in 0.2s). Although this works well, I would like to speed it up

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0

牧云@^-^@ 提交于 2019-12-01 02:35:05
问题 I have applied Logistic Regression on train set after splitting the data set into test and train sets, but I got the above error. I tried to work it out, and when i tried to print my response vector y_train in the console it prints integer values like 0 or 1. But when i wrote it into a file I found the values were float numbers like 0.0 and 1.0. If thats the problem, how can I over come it. lenreg = LogisticRegression() print y_train[0:10] y_train.to_csv(path='ytard.csv') lenreg.fit(X_train,

Why did PCA reduced the performance of Logistic Regression?

佐手、 提交于 2019-11-30 23:45:10
I performed Logistic regression on a binary classification problem with data of 50000 X 370 dimensions.I got accuracy of about 90%.But when i did PCA + logistic on data, my accuracy reduced to 10%, I was very shocked to see this result. Can anybody explain what could have gone wrong? There is no guarantee that PCA will ever help, or not harm the learning process. In particular - if you use PCA to reduce amount of dimensions - you are removing information from your data, thus everything can happen - if the removed data was redundant, you will probably get better scores, if it was an important

Is it possible to train a sklearn model (eg SVM) incrementally? [duplicate]

时间秒杀一切 提交于 2019-11-30 20:46:43
问题 This question already has answers here : Does the SVM in sklearn support incremental (online) learning? (6 answers) Closed 9 months ago . I'm trying to perform sentiment analysis over the twitter dataset "Sentiment140" which consists of 1.6 million labelled tweets . I'm constructing my feature vector using Bag Of Words ( Unigram ) model , so each tweet is represented by about 20000 features . Now to train my sklearn model (SVM,Logistic Regression,Naive Bayes) using this dataset , i have to

Regression (logistic) in R: Finding x value (predictor) for a particular y value (outcome)

杀马特。学长 韩版系。学妹 提交于 2019-11-30 19:39:16
I've fitted a logistic regression model that predicts the a binary outcome vs from mpg ( mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg value is when the probability of vs is 0.50. Appreciate any help anyone can provide! model <- glm(vs ~ mpg, data = mtcars, family = binomial) ggplot(mtcars, aes(mpg, vs)) + geom_point() + stat_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE) The easiest way to calculate predicted values from your model is with the predict

How can I use multi cores processing to run glm function faster

爱⌒轻易说出口 提交于 2019-11-30 18:44:04
I'm a bit new to r and I would like to use a package that allows multi cores processing in order to run glm function faster.I wonder If there is a syntax that I can use for this matter. Here is an example glm model that I wrote, can I add a parameter that will use multi cores ? g<-glm(IsChurn~.,data=dat,family='binomial') Thanks. Other usefull packages are: http://cran.r-project.org/web/packages/gputools/gputools.pdf with gpuGlm and http://cran.r-project.org/web/packages/mgcv/mgcv.pdf see mgcv.parallel section about gam(..., control=list(nthreads=nc)) or bam(..., cluster=makeCluster(nc)) where