logistic-regression | 易学教程

R - Getting Column of Dataframe from String [duplicate]

阅读更多关于 R - Getting Column of Dataframe from String [duplicate]

问题 This question already has answers here : Dynamically select data frame columns using $ and a vector of column names (8 answers) Closed 3 years ago . I am trying to create a function that allows the conversion of selected columns of a data frame to categorical data type (factor) before running a regression analysis. Question is how do I slice a particular column from a data frame using a string (character). Example: strColumnNames <- "Admit,Rank" strDelimiter <- "," strSplittedColumnNames <-

Why most of the predicted results are 0 when I use a Caffe BP regression model?

阅读更多关于 Why most of the predicted results are 0 when I use a Caffe BP regression model?

问题 I converted my input data into hdf5 format. And each input data has a shape of 309 dims and a label the input data just as follow: part of the input data like this my net structure as follow: name: "RegressionNet" layer { name: "framert" type: "HDF5Data" top: "data" top: "label" include { phase: TRAIN } hdf5_data_param { source: "train_data_list.txt" batch_size: 100 } } layer { name: "framert" type: "HDF5Data" top: "data" top: "label" include { phase: TEST } hdf5_data_param { source: "test

How to print the probability of prediction in LogisticRegressionWithLBFGS for pyspark

阅读更多关于 How to print the probability of prediction in LogisticRegressionWithLBFGS for pyspark

问题 I am using Spark 1.5.1 and, In pyspark, after I fit the model using: model = LogisticRegressionWithLBFGS.train(parsedData) I can print the prediction using: model.predict(p.features) Is there a function to print the probability score also along with the prediction? 回答1: You have to clear the threshold first, and this works only for binary classification: from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel from pyspark.mllib.regression import

how to get the log likelihood for a logistic regression model in sklearn?

阅读更多关于 how to get the log likelihood for a logistic regression model in sklearn?

问题 I'm using a logistic regression model in sklearn and I am interested in retrieving the log likelihood for such a model, so to perform an ordinary likelihood ratio test as suggested here. The model is using the log loss as scoring rule. In the documentation, the log loss is defined "as the negative log-likelihood of the true labels given a probabilistic classifier’s predictions" . However, the value is always positive, whereas the log likelihood should be negative. As an example: from sklearn

Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

阅读更多关于 Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

问题 I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not

Stata's xtlogit (fe, re) equivalent in R?

阅读更多关于 Stata's xtlogit (fe, re) equivalent in R?

问题 Stata allows for fixed effects and random effects specification of the logistic regression through the xtlogit fe and xtlogit re commands accordingly. I was wondering what are the equivalent commands for these specifications in R. The only similar specification I am aware of is the mixed effects logistic regression mymixedlogit <- glmer(y ~ x1 + x2 + x3 + (1 | x4), data = d, family = binomial) but I am not sure whether this maps to any of the aforementioned commands. 回答1: The glmer command is

R Error in solve.default(V) : 'a' is 0-diml in regTermTest function

阅读更多关于 R Error in solve.default(V) : 'a' is 0-diml in regTermTest function

问题 I'm trying to use regTermTest function in R survey package to test the significance of each variables for logistic regression. However, I got a solver error for one of my variable, "fun". The error is Error in solve.default(V) : 'a' is 0-diml My code for the logistic regression is model2 <- glm(decision~samerace+race_o+field+goal+attr+sinc+intel+fun+amb+shar+like+prob,data=trg2, family=binomial) regTermTest(model2, "fun") I also encountered p = NA result for another variable "amb".

Logistic regression results different in Scikit python and R?

阅读更多关于 Logistic regression results different in Scikit python and R?

问题 I was running logistic regression on iris dataset on both R and Python.But both are giving different results(coefficients,intercept and scores). #Python codes. In[23]: iris_df.head(5) Out[23]: Sepal.Length Sepal.Width Petal.Length Petal.Width Species 0 5.1 3.5 1.4 0.2 0 1 4.9 3.0 1.4 0.2 0 2 4.7 3.2 1.3 0.2 0 3 4.6 3.1 1.5 0.2 0 In[35]: iris_df.shape Out[35]: (100, 5) #looking at the levels of the Species dependent variable.. In[25]: iris_df['Species'].unique() Out[25]: array([0, 1], dtype

Reproducing drc::plot.drc with ggplot2

阅读更多关于 Reproducing drc::plot.drc with ggplot2

问题 I want to reproduce the following drc::plot.drc graphs with ggplot2 . df1 <- structure(list(TempV = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 4L,

`warm_start` Parameter And Its Impact On Computational Time

阅读更多关于 `warm_start` Parameter And Its Impact On Computational Time

问题 I have a logistic regression model with a defined set of parameters ( warm_start=True ). As always, I call LogisticRegression.fit(X_train, y_train) and use the model after to predict new outcomes. Suppose I alter some parameters, say, C=100 and call .fit method again using the same training data . Theoretically, for the second time, I think .fit should take less computational time as compared to the model with warm_start=False . However, empirically is not actually true. Please, help me