statsmodels | 易学教程

GLM in statsmodel returning error

阅读更多关于 GLM in statsmodel returning error

问题 Now that I have figured out how to use OLS ( Pandas/Statsmodel OLS predicting future values ), I am trying to fit a nicer curve to my data...GLM should work similarly I assumed. import statsmodels.api as sma df1['intercept'] = 1 y = df1[['intercept', 'date_delta']] X = df1['monthly_data'] smaresults_normal = sma.GLM(X,y, family=sma.families.Binomial()).fit() returns ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported.

Statsmodels formula API (patsy): How to exclude a subset of interaction components?

阅读更多关于 Statsmodels formula API (patsy): How to exclude a subset of interaction components?

问题 I'm building a WLS ( statsmodels.formula.api.wls ) model using the statsmodels formulas API (from patsy) and I'm using interactions between factors. Some of these are predictive whereas others are not. Is there a way to include only a subset of the interactions in the model without resorting to building a design matrix by hand? Alternatively, is there a way to constrain the estimated coefficients of a subset of the model variables to be equal to zero? 回答1: I'm not sure I understand exactly

statsmodel.api.Logit: valueerror array must not contain infs or nans

阅读更多关于 statsmodel.api.Logit: valueerror array must not contain infs or nans

问题 I am trying to apply Logistic Regression in Python using statsmodel.api.Logit. I am running into the error ValueError: array must not contain infs or NaNs. When I am executing with: data['intercept'] = 1.0 train_cols = data.columns[1:] logit = sm.Logit(data['admit'], data[train_cols]) result = logit.fit(start_params=None, method='bfgs', maxiter=20, full_output=1, disp=1, callback=None) The data contains more than 15000 columns and 2000 rows. which data['admit'] is the target value and data

Namespace issues when calling patsy within a function

阅读更多关于 Namespace issues when calling patsy within a function

问题 I am attempting to write a wrapper for the statsmodels formula API (this is a simplified version, the function does more than this): import statsmodels.formula.api as smf def wrapper(formula, data, **kwargs): return smf.logit(formula, data).fit(**kwargs) If I give this function to a user, who then attempts to define his/her own function: def square(x): return x**2 model = wrapper('y ~ x + square(x)', data=df) they will receive a NameError because the patsy module is looking in the namespace

predict statsmodel argument Error

阅读更多关于 predict statsmodel argument Error

问题 I am trying to predict outofsample values for an array. Python code: import pandas as pd import numpy as np from statsmodels.tsa.arima_model import ARIMA dates = pd.date_range('2012-07-09','2012-07-30') series = [43.,32.,63.,98.,65.,78.,23.,35.,78.,56.,45.,45.,56.,6.,63.,45.,64.,34.,76.,34.,14.,54.] res = pd.Series(series, index=dates) r = ARIMA(res,(1,2,0)) pred = r.predict(start='2012-07-31', end='2012-08-31') I am getting this error.I see I have given two argument but compiler return I

How to get constant term in AR Model with statsmodels and Python?

阅读更多关于 How to get constant term in AR Model with statsmodels and Python?

问题 I'm trying to model my time series data using the AR model. This is the code that I'm using. # Compute AR-model (data is a python list of number) model = AR(data) result = model.fit() plt.plot(data, 'b-', label='data') plt.plot(range(result.k_ar, len(data)), result.fittedvalues, 'r-') plt.show() I've successfully get the p value using result.k_ar , parameter with result.params , epsilon term with result.sigma2 . The problem is that I can't find a way to get the c (constant) term. Here is the

Different coefficients: scikit-learn vs statsmodels (logistic regression)

阅读更多关于 Different coefficients: scikit-learn vs statsmodels (logistic regression)

问题 When running a logistic regression, the coefficients I get using statsmodels are correct (verified them with some course material). However, I am unable to get the same coefficients with sklearn. I've tried preprocessing the data to no avail. This is my code: Statsmodels: import statsmodels.api as sm X_const = sm.add_constant(X) model = sm.Logit(y, X_const) results = model.fit() print(results.summary()) The relevant output is: coef std err z P>|z| [0.025 0.975] -------------------------------

statsmodel predict start and end indices

阅读更多关于 statsmodel predict start and end indices

问题 I am trying to implement the prediction function from statsmodel package prediction = results.predict(start=1,end=len(test),exog=test) The dates of the input, test, and the output prediction are inconsistent. I get 1/4/2012 to 7/25/2012 for the former and 4/26/2013 to 11/13/2013 for the latter. Part of the difficulty is that I don't have a completely recurring frequency - I have daily values excluding weekends and holidays. What is the appropriate way to set the indices? x = psql.frame_query

Why are the logistic regression results different between statsmodels and R?

阅读更多关于 Why are the logistic regression results different between statsmodels and R?

问题 I am trying to compare the logistic regression implementations in python's statsmodels and R. Python version: import statsmodels.api as sm import pandas as pd import pylab as pl import numpy as np df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv") df.columns = list(df.columns)[:3] + ["prestige"] # df.hist() # pl.show() dummy_ranks = pd.get_dummies(df["prestige"], prefix="prestige") cols_to_keep = ["admit", "gre", "gpa"] data = df[cols_to_keep].join(dummy_ranks.ix[:, "prestige_2"

Can pandas groupby transform a DataFrame into a Series?

阅读更多关于 Can pandas groupby transform a DataFrame into a Series?

问题 I would like to use pandas and statsmodels to fit a linear model on subsets of a dataframe and return the predicted values. However, I am having trouble figuring out the right pandas idiom to use. Here is what I am trying to do: import pandas as pd import statsmodels.formula.api as sm import seaborn as sns tips = sns.load_dataset("tips") def fit_predict(df): m = sm.ols("tip ~ total_bill", df).fit() return pd.Series(m.predict(df), index=df.index) tips["predicted_tip"] = tips.groupby("day")