logistic-regression

Correctness of logistic regression in Vowpal Wabbit?

妖精的绣舞 提交于 2019-11-27 10:02:10
问题 I have started using Vowpal Wabbit for logistic regression, however I am unable to reproduce the results it gives. Perhaps there is some undocumented "magic" it does, but has anyone been able to replicate / verify / check the calculations for logistic regression? For example, with the simple data below, we aim to model the way age predicts label . It is obvious there is a strong relationship as when age increases the probability of observing 1 increases. As a simple unit test, I used the 12

How to find the importance of the features for a logistic regression model?

╄→гoц情女王★ 提交于 2019-11-27 09:35:46
问题 I have a binary prediction model trained by logistic regression algorithm. I want know which features(predictors) are more important for the decision of positive or negative class. I know there is coef_ parameter comes from the scikit-learn package, but I don't know whether it is enough to for the importance. Another thing is how I can evaluate the coef_ values in terms of the importance for negative and positive classes. I also read about standardized regression coefficients and I don't know

sklearn Logistic Regression “ValueError: Found array with dim 3. Estimator expected <= 2.”

梦想与她 提交于 2019-11-27 08:15:21
I attempt to solve this problem 6 in this notebook. The question is to train a simple model on this data using 50, 100, 1000 and 5000 training samples by using the LogisticRegression model from sklearn.linear_model. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/1_notmnist.ipynb lr = LogisticRegression() lr.fit(train_dataset,train_labels) This is the code i trying to do and it give me the error. ValueError: Found array with dim 3. Estimator expected <= 2. Any idea? Kristian K. scikit-learn expects 2d num arrays for the training dataset for a fit function. The

Getting a low ROC AUC score but a high accuracy

被刻印的时光 ゝ 提交于 2019-11-27 04:34:29
Using a LogisticRegression class in scikit-learn on a version of the flight delay dataset . I use pandas to select some columns: df = df[["MONTH", "DAY_OF_MONTH", "DAY_OF_WEEK", "ORIGIN", "DEST", "CRS_DEP_TIME", "ARR_DEL15"]] I fill in NaN values with 0: df = df.fillna({'ARR_DEL15': 0}) Make sure the categorical columns are marked with the 'category' data type: df["ORIGIN"] = df["ORIGIN"].astype('category') df["DEST"] = df["DEST"].astype('category') Then call get_dummies() from pandas : df = pd.get_dummies(df) Now I train and test my data set: from sklearn.linear_model import

predict_proba for a cross-validated model

依然范特西╮ 提交于 2019-11-27 02:13:19
问题 I would like to predict the probability from Logistic Regression model with cross-validation. I know you can get the cross-validation scores, but is it possible to return the values from predict_proba instead of the scores? # imports from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import (StratifiedKFold, cross_val_score, train_test_split) from sklearn import datasets # setup data iris = datasets.load_iris() X = iris.data y = iris.target # setup model cv =

sklearn Logistic Regression “ValueError: Found array with dim 3. Estimator expected <= 2.”

百般思念 提交于 2019-11-26 13:49:37
问题 I attempt to solve this problem 6 in this notebook. The question is to train a simple model on this data using 50, 100, 1000 and 5000 training samples by using the LogisticRegression model from sklearn.linear_model. https://github.com/tensorflow/examples/blob/master/courses/udacity_deep_learning/1_notmnist.ipynb lr = LogisticRegression() lr.fit(train_dataset,train_labels) This is the code i trying to do and it give me the error. ValueError: Found array with dim 3. Estimator expected <= 2. Any

Getting a low ROC AUC score but a high accuracy

五迷三道 提交于 2019-11-26 11:15:24
问题 Using a LogisticRegression class in scikit-learn on a version of the flight delay dataset. I use pandas to select some columns: df = df[[\"MONTH\", \"DAY_OF_MONTH\", \"DAY_OF_WEEK\", \"ORIGIN\", \"DEST\", \"CRS_DEP_TIME\", \"ARR_DEL15\"]] I fill in NaN values with 0: df = df.fillna({\'ARR_DEL15\': 0}) Make sure the categorical columns are marked with the \'category\' data type: df[\"ORIGIN\"] = df[\"ORIGIN\"].astype(\'category\') df[\"DEST\"] = df[\"DEST\"].astype(\'category\') Then call get

How to implement the Softmax function in Python

泪湿孤枕 提交于 2019-11-26 06:54:15
问题 From the Udacity\'s deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector: Where S(y_i) is the softmax function of y_i and e is the exponential and j is the no. of columns in the input vector Y. I\'ve tried the following: import numpy as np def softmax(x): \"\"\"Compute softmax values for each sets of scores in x.\"\"\" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum() scores = [3.0, 1.0, 0.2] print(softmax(scores)) which

How to choose cross-entropy loss in tensorflow?

余生长醉 提交于 2019-11-26 02:48:33
Classification problems, such as logistic regression or multinomial logistic regression, optimize a cross-entropy loss. Normally, the cross-entropy layer follows the softmax layer, which produces probability distribution. In tensorflow, there are at least a dozen of different cross-entropy loss functions : tf.losses.softmax_cross_entropy tf.losses.sparse_softmax_cross_entropy tf.losses.sigmoid_cross_entropy tf.contrib.losses.softmax_cross_entropy tf.contrib.losses.sigmoid_cross_entropy tf.nn.softmax_cross_entropy_with_logits tf.nn.sigmoid_cross_entropy_with_logits ... Which work only for

How to choose cross-entropy loss in tensorflow?

大城市里の小女人 提交于 2019-11-26 01:11:07
问题 Classification problems, such as logistic regression or multinomial logistic regression, optimize a cross-entropy loss. Normally, the cross-entropy layer follows the softmax layer, which produces probability distribution. In tensorflow, there are at least a dozen of different cross-entropy loss functions : tf.losses.softmax_cross_entropy tf.losses.sparse_softmax_cross_entropy tf.losses.sigmoid_cross_entropy tf.contrib.losses.softmax_cross_entropy tf.contrib.losses.sigmoid_cross_entropy tf.nn