statsmodels

Logistic Regression in statsmodels “LinAlgError: Singular matrix”

本小妞迷上赌 提交于 2019-12-06 13:45:39
问题 Not sure why but I'm getting a "numpy.linalg.linalg.LinAlgError: Singular matrix" error when fitting a logistic regression model. from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split import statsmodels.api as sm data = load_breast_cancer() y = data.target X = data.data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=2) X_train = sm.add_constant(X_train) X_test = sm.add_constant(X_test) model =

Python Statsmodels x13_arima_analysis : AttributeError: 'dict' object has no attribute 'iteritems'

半腔热情 提交于 2019-12-06 12:04:13
问题 Step 1 : My sample data import pandas as pd from pandas import Timestamp s = pd.Series( {Timestamp('2013-03-01 00:00:00'): 838.2, Timestamp('2013-04-01 00:00:00'): 865.17, Timestamp('2013-05-01 00:00:00'): 763.0, Timestamp('2013-06-01 00:00:00'): 802.99, Timestamp('2013-07-01 00:00:00'): 875.56, Timestamp('2013-08-01 00:00:00'): 754.4, Timestamp('2013-09-01 00:00:00'): 617.48, Timestamp('2013-10-01 00:00:00'): 994.75, Timestamp('2013-11-01 00:00:00'): 860.86, Timestamp('2013-12-01 00:00:00'):

Python multinomial logit with statsmodels module: Change base value of mlogit regression

一个人想着一个人 提交于 2019-12-06 09:17:32
问题 I have a little problem which I am stuck with. I am building a multinomial logit model with Python statsmodels and wish to reproduce an example given in a textbook. So far so good, but I am struggling with setting a different target value as the base value for the regression. Can somebody help?! import numpy as np import pandas as pd import statsmodels.api as sm import matplotlib.pyplot as plt #import data df = pd.read_excel('C:/.../diabetes.xlsx') #split the data in dependent and independent

ECDF in python without step function?

こ雲淡風輕ζ 提交于 2019-12-06 08:47:30
I have been using ECDF (empirical cumulative distribution function) from statsmodels.distributions to plot a CDF of some data. However, ECDF uses a step function and as a consequence I get jagged-looking plots. So my question is: Do scipy or statsmodels have a ECDF baked-in without a step function? By the way, I know I can do this: hist, bin_edges = histogram(b_oz, normed=True) plot(np.cumsum(hist)) but I don't get the right scales. Thanks! If you just want to change the plot, then you could let matplotlib interpolate between the observed values. >>> xx = np.random.randn(nobs) >>> ecdf = sm

Product of two beta distributions

牧云@^-^@ 提交于 2019-12-06 06:27:16
Say I have two random variables: X ~ Beta(α1,β1) Y ~ Beta(α2,β2) I would like to compute distribution of Z = XY (the product of the random variables) With scipy , I can get the pdf of a single Beta with: from scipy.stats import beta rv = beta(a, b) x = np.linspace(start=0, stop=1, num=200) my_pdf = rv.pdf(x) But what about the product of two Betas? Can I do this analytically ? (Python/Julia/R solutions are fine). Sven Hohenstein For an analytical solution, have a look at this paper and this answer . A numerical approach in R set.seed(1) # for reproducability n <- 100000 # number of random

Weighted Non-negative Least Square Linear Regression in python [closed]

霸气de小男生 提交于 2019-12-06 05:53:36
Closed. This question is off-topic . It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I know there is an weighted OLS solver , and a constrained OLS solver . Is there a routine that combines the two? You can simulate OLS weighting by modifying the X and y inputs. In OLS, you solve β for X t X β = X t y . In Weighted OLS, you solve X t X W β = X t W y . where W is a diagonal matrix with nonnegative entries. It follows that W 0.5 exists, and you can formulate this as (X W 0.5 ) t (XW 0.5 ) β = (X W

Multinomial/conditional Logit Regression, Why StatsModel fails on mlogit package example?

不打扰是莪最后的温柔 提交于 2019-12-06 05:51:46
问题 I am trying to reproduce an example of a multinomial logit regression of the mlogit package in R. data("Fishing", package = "mlogit") Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode") #a pure "conditional" model summary(mlogit(mode ~ price + catch, data = Fish)) To reproduce this example with statsmodel function MNLogit, I export the Fishing data set as a csv file and do the following import pandas import statsmodels.api as st #load data df = pandas.read_csv(

Statsmodels logistic regression convergence problems

谁都会走 提交于 2019-12-06 05:21:55
I'm trying to run a logistic regression in statsmodels on a large design matrix (~200 columns). The features include a number of interactions, categorical features and semi-sparse (70%) integer features. Although my design matrix is not actually ill-conditioned, it seems to be somewhat close (according to numpy.linalg.matrix_rank , it is full-rank with tol=1e-3 but not with tol=1e-2 ). As a result, I'm struggling to get logistic regression to converge with any of the methods in statsmodels. Here's what I've tried so far: method='newton' : Did not converge after 1000 iterations; raised a

Why can't statsmodels reproduce my R logistic regression results?

笑着哭i 提交于 2019-12-06 03:52:19
问题 I'm confused about why my logistic regression models in R and statsmodels do not agree. If I prepare some data in R with # From https://courses.edx.org/c4x/MITx/15.071x/asset/census.csv library(caTools) # for sample.split census = read.csv("census.csv") set.seed(2000) split = sample.split(census$over50k, SplitRatio = 0.6) censusTrain = subset(census, split==TRUE) censusTest = subset(census, split==FALSE) and then run a logistic regression with CensusLog1 = glm(over50k ~., data=censusTrain,

Plotting Pandas OLS linear regression results

别说谁变了你拦得住时间么 提交于 2019-12-06 02:42:16
问题 How would I plot my linear regression results for this linear regression I did from pandas? import pandas as pd from pandas.stats.api import ols df = pd.read_csv('Samples.csv', index_col=0) control = ols(y=df['Control'], x=df['Day']) one = ols(y=df['Sample1'], x=df['Day']) two = ols(y=df['Sample2'], x=df['Day']) I tried plot() but it did not work. I want to plot all three samples on one plot are there any pandas code or matplotlib code to hadle data in the format of these summaries? Anyways