logistic-regression | 易学教程

Creating a sklearn.linear_model.LogisticRegression instance from existing coefficients

阅读更多关于 Creating a sklearn.linear_model.LogisticRegression instance from existing coefficients

Can one create such an instance based on existing coefficients which were calculated say in a different implementation (e.g. Java)? I tried creating an instance then setting coef_ and intercept_ directly and it seems to work but I'm not sure if there's a down side here or if I might be breaking something. Yes, it works okay: import numpy as np from scipy.stats import norm from sklearn.linear_model import LogisticRegression import json x = np.arange(10)[:, np.newaxis] y = np.array([0,0,0,1,0,0,1,1,1,1]) # training one logistic regression model1 = LogisticRegression(C=10, penalty='l1').fit(x, y)

Model runs with glm but not bigglm

阅读更多关于 Model runs with glm but not bigglm

I was trying to run a logistic regression on 320,000 rows of data (6 variables). Stepwise model selection on a sample of the data (10000) gives a rather complex model with 5 interaction terms: Y~X1+ X2*X3+ X2*X4+ X2*X5+ X3*X6+ X4*X5 . The glm() function could fit this model with 10000 rows of data, but not with the whole dataset (320,000). Using bigglm to read data chunk by chunk from a SQL server resulted in an error, and I couldn't make sense of the results from traceback() : fit <- bigglm(Y~X1+ X2*X3+ X2*X4+ X2*X5+ X3*X6+ X4*X5, data=sqlQuery(myconn,train_dat),family=binomial(link="logit"),

Using categorical data as features in sklean LogisticRegression

阅读更多关于 Using categorical data as features in sklean LogisticRegression

I'm trying to understand how to use categorical data as features in sklearn.linear_model 's LogisticRegression . I understand of course I need to encode it. What I don't understand is how to pass the encoded feature to the Logistic regression so it's processed as a categorical feature, and not interpreting the int value it got when encoding as a standard quantifiable feature. (Less important) Can somebody explain the difference between using preprocessing.LabelEncoder() , DictVectorizer.vocabulary or just encoding the categorical data yourself with a simple dict? Alex A.'s comment here touches

Multi-Class Logistic Regression in SciKit Learn

阅读更多关于 Multi-Class Logistic Regression in SciKit Learn

I am having trouble with the proper call of Scikit's Logistic Regression for the multi-class case. I am using the lbgfs solver, and I do have the multi_class parameter set to multinomial. It is unclear to me how to pass the true class labels in fitting the model. I had assumed that it was similar/same as for the random forest classifier multi-class, where you pass [n_samples, m_classes] dataframe. However, in doing this, I get an error that the data is of a bad shape. ValueError: bad input shape (20, 5) -- in this tiny example, there were 5 classes, 20 samples. On inspection, the documentation

Load and predict new data sklearn

阅读更多关于 Load and predict new data sklearn

I trained a Logistic model, cross-validated and saved it to file using joblib module. Now I want to load this model and predict new data with it. Is this the correct way to do this? Especially the standardization. Should I use scaler.fit() on my new data too? In the tutorials I followed, scaler.fit was only used on the training set, so I'm a bit lost here. Here is my code: #Loading the saved model with joblib model = joblib.load('model.pkl') # New data to predict pr = pd.read_csv('set_to_predict.csv') pred_cols = list(pr.columns.values)[:-1] # Standardize new data scaler = StandardScaler() X

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1.0

阅读更多关于 ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1.0

问题 I have a training dataset of 8670 trials and each trial has a length of 125-time samples while my test set consists of 578 trials. When I apply SVM algorithm from scikit-learn, I get pretty good results. However, when I apply logistic regression, this error occurs: "ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1.0" . My question is why SVM is able to give predictions but logistic regression gives this error? Could it be

Binary classification in TensorFlow, unexpected large values for loss and accuracy

阅读更多关于 Binary classification in TensorFlow, unexpected large values for loss and accuracy

问题 I am trying to use a deep neural network architecture to classify against a binary label value - -1 and +1. Here is my code to do it in tensorflow . import tensorflow as tf import numpy as np from preprocess import create_feature_sets_and_labels train_x,train_y,test_x,test_y = create_feature_sets_and_labels() x = tf.placeholder('float', [None, 5]) y = tf.placeholder('float') n_nodes_hl1 = 500 n_nodes_hl2 = 500 n_nodes_hl3 = 500 n_classes = 1 batch_size = 100 def neural_network_model(data):

LC50 / LD50 confidence intervals from multiple regression glm with interaction

阅读更多关于 LC50 / LD50 confidence intervals from multiple regression glm with interaction

I have a quasibinomial glm with two continuous explanatory variables (let's say "LogPesticide" and "LogFood") and an interaction. I would like to calculate the LC50 of the pesticide with confidence intervals at different amounts of food (e. g. the minimum and maximum food value). How can this be achieved? Example: First I generate a data set. mydata <- data.frame( LogPesticide = rep(log(c(0, 0.1, 0.2, 0.4, 0.8, 1.6) + 0.05), 4), LogFood = rep(log(c(1, 2, 4, 8)), each = 6) ) set.seed(seed=16) growth <- function(x, a = 1, K = 1, r = 1) { # Logistic growth function. a = position of turning point

ggplot2: How to combine histogram, rug plot, and logistic regression prediction in a single graph

阅读更多关于 ggplot2: How to combine histogram, rug plot, and logistic regression prediction in a single graph

I am trying to plot combined graphs for logistic regressions as the function logi.hist.plot but I would like to do it using ggplot2 (aesthetic reasons). The problem is that only one of the histograms should have the scale_y_reverse(). Is there any way to specify this in a single plot (see code below) or to overlap the two histograms by using coordinates that can be passed to the previous plot? ggplot(dat) + geom_point(aes(x=ind, y=dep)) + stat_smooth(aes(x=ind, y=dep), method=glm, method.args=list(family="binomial"), se=FALSE) + geom_histogram(data=dat[dat$dep==0,], aes(x=ind)) + geom

Why is logistic regression called regression? [closed]

阅读更多关于 Why is logistic regression called regression? [closed]

According to what I have understood, linear regression predicts the outcome which can have continuous values, whereas logistic regression predicts outcome which is discrete. It seems to me that logistic regression is similar to a classification problem. So, why is it called regression ? There is also a related question: What is the difference between linear regression and logistic regression? There is a strict link between linear regression and logistic regression. With linear regression you're looking for the k i parameters: h = k 0 + Σ k i &dot; X i = K t &dot; X With logistic regression you

订阅 logistic-regression