statsmodels

GLM in statsmodel returning error

一世执手 提交于 2019-12-11 03:43:47
问题 Now that I have figured out how to use OLS ( Pandas/Statsmodel OLS predicting future values ), I am trying to fit a nicer curve to my data...GLM should work similarly I assumed. import statsmodels.api as sma df1['intercept'] = 1 y = df1[['intercept', 'date_delta']] X = df1['monthly_data'] smaresults_normal = sma.GLM(X,y, family=sma.families.Binomial()).fit() returns ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported.

Statsmodels formula API (patsy): How to exclude a subset of interaction components?

六月ゝ 毕业季﹏ 提交于 2019-12-11 02:59:36
问题 I'm building a WLS ( statsmodels.formula.api.wls ) model using the statsmodels formulas API (from patsy) and I'm using interactions between factors. Some of these are predictive whereas others are not. Is there a way to include only a subset of the interactions in the model without resorting to building a design matrix by hand? Alternatively, is there a way to constrain the estimated coefficients of a subset of the model variables to be equal to zero? 回答1: I'm not sure I understand exactly

statsmodel.api.Logit: valueerror array must not contain infs or nans

≯℡__Kan透↙ 提交于 2019-12-11 01:48:52
问题 I am trying to apply Logistic Regression in Python using statsmodel.api.Logit. I am running into the error ValueError: array must not contain infs or NaNs. When I am executing with: data['intercept'] = 1.0 train_cols = data.columns[1:] logit = sm.Logit(data['admit'], data[train_cols]) result = logit.fit(start_params=None, method='bfgs', maxiter=20, full_output=1, disp=1, callback=None) The data contains more than 15000 columns and 2000 rows. which data['admit'] is the target value and data

Namespace issues when calling patsy within a function

强颜欢笑 提交于 2019-12-10 21:42:23
问题 I am attempting to write a wrapper for the statsmodels formula API (this is a simplified version, the function does more than this): import statsmodels.formula.api as smf def wrapper(formula, data, **kwargs): return smf.logit(formula, data).fit(**kwargs) If I give this function to a user, who then attempts to define his/her own function: def square(x): return x**2 model = wrapper('y ~ x + square(x)', data=df) they will receive a NameError because the patsy module is looking in the namespace

predict statsmodel argument Error

末鹿安然 提交于 2019-12-10 19:56:40
问题 I am trying to predict outofsample values for an array. Python code: import pandas as pd import numpy as np from statsmodels.tsa.arima_model import ARIMA dates = pd.date_range('2012-07-09','2012-07-30') series = [43.,32.,63.,98.,65.,78.,23.,35.,78.,56.,45.,45.,56.,6.,63.,45.,64.,34.,76.,34.,14.,54.] res = pd.Series(series, index=dates) r = ARIMA(res,(1,2,0)) pred = r.predict(start='2012-07-31', end='2012-08-31') I am getting this error.I see I have given two argument but compiler return I

How to get constant term in AR Model with statsmodels and Python?

試著忘記壹切 提交于 2019-12-10 19:24:22
问题 I'm trying to model my time series data using the AR model. This is the code that I'm using. # Compute AR-model (data is a python list of number) model = AR(data) result = model.fit() plt.plot(data, 'b-', label='data') plt.plot(range(result.k_ar, len(data)), result.fittedvalues, 'r-') plt.show() I've successfully get the p value using result.k_ar , parameter with result.params , epsilon term with result.sigma2 . The problem is that I can't find a way to get the c (constant) term. Here is the

Different coefficients: scikit-learn vs statsmodels (logistic regression)

坚强是说给别人听的谎言 提交于 2019-12-10 17:47:14
问题 When running a logistic regression, the coefficients I get using statsmodels are correct (verified them with some course material). However, I am unable to get the same coefficients with sklearn. I've tried preprocessing the data to no avail. This is my code: Statsmodels: import statsmodels.api as sm X_const = sm.add_constant(X) model = sm.Logit(y, X_const) results = model.fit() print(results.summary()) The relevant output is: coef std err z P>|z| [0.025 0.975] -------------------------------

statsmodel predict start and end indices

爷,独闯天下 提交于 2019-12-10 17:27:43
问题 I am trying to implement the prediction function from statsmodel package prediction = results.predict(start=1,end=len(test),exog=test) The dates of the input, test, and the output prediction are inconsistent. I get 1/4/2012 to 7/25/2012 for the former and 4/26/2013 to 11/13/2013 for the latter. Part of the difficulty is that I don't have a completely recurring frequency - I have daily values excluding weekends and holidays. What is the appropriate way to set the indices? x = psql.frame_query

Why are the logistic regression results different between statsmodels and R?

断了今生、忘了曾经 提交于 2019-12-10 15:10:27
问题 I am trying to compare the logistic regression implementations in python's statsmodels and R. Python version: import statsmodels.api as sm import pandas as pd import pylab as pl import numpy as np df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv") df.columns = list(df.columns)[:3] + ["prestige"] # df.hist() # pl.show() dummy_ranks = pd.get_dummies(df["prestige"], prefix="prestige") cols_to_keep = ["admit", "gre", "gpa"] data = df[cols_to_keep].join(dummy_ranks.ix[:, "prestige_2"

Can pandas groupby transform a DataFrame into a Series?

梦想的初衷 提交于 2019-12-10 14:58:37
问题 I would like to use pandas and statsmodels to fit a linear model on subsets of a dataframe and return the predicted values. However, I am having trouble figuring out the right pandas idiom to use. Here is what I am trying to do: import pandas as pd import statsmodels.formula.api as sm import seaborn as sns tips = sns.load_dataset("tips") def fit_predict(df): m = sm.ols("tip ~ total_bill", df).fit() return pd.Series(m.predict(df), index=df.index) tips["predicted_tip"] = tips.groupby("day")