statsmodels

Python statsmodels robust cov_type='hac-panel' issue

情到浓时终转凉″ 提交于 2021-01-27 21:12:15
问题 I hope this is the right place for my question. I would like to understand how to use the 'hac-panel' cov_type when running sm.OLS. I have struggled with it the whole day but still cannot figure it out. Here is an example of my code (with data): import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf from pandas.tseries.offsets import * # Just grabbing some random data here dat = sm.datasets.macrodata.load_pandas().data dat['time'] = dat['year'].apply(lambda x:

How to forecast time series using AutoReg

一个人想着一个人 提交于 2021-01-27 18:54:47
问题 I'm trying to build old school model using only auto regression algorithm. I found out that there's an implementation of it in statsmodel package. I've read the documentation, and as I understand it should work as ARIMA. So, here's my code: import statsmodels.api as sm model = sm.tsa.AutoReg(df_train.beer, 12).fit() And when I want to predict new values, I'm trying to follow the documentation: y_pred = model.predict(start=df_test.index.min(), end=df_test.index.max()) # or y_pred = model

Clustered standard errors in statsmodels with categorical variables (Python)

让人想犯罪 __ 提交于 2021-01-24 07:25:23
问题 I want to run a regression in statsmodels that uses categorical variables and clustered standard errors. I have a dataset with columns institution, treatment, year, and enrollment. Treatment is a dummy, institution is a string, and the others are numbers. I've made sure to drop any null values. df.dropna() reg_model = smf.ols("enroll ~ treatment + C(year) + C(institution)", df) .fit(cov_type='cluster', cov_kwds={'groups': df['institution']}) I'm getting the following: ValueError: The weights

Anova test for GLM in python

浪子不回头ぞ 提交于 2021-01-19 04:16:31
问题 I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM. formula = 'PropNo_Pred ~ Geography + log10BMI + Cat_OpCavity + CatLes_neles + CatRural_urban + \ CatPred_Control + CatNative_Intro + Midpoint_of_study' mod1 = smf.glm(formula=formula, data=A2, family=sm.families.Binomial()).fit() mod1.summary() After that I am trying to do the ANOVA test for this model using the anova in statsmodels.stats table1

Anova test for GLM in python

…衆ロ難τιáo~ 提交于 2021-01-19 04:16:20
问题 I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM. formula = 'PropNo_Pred ~ Geography + log10BMI + Cat_OpCavity + CatLes_neles + CatRural_urban + \ CatPred_Control + CatNative_Intro + Midpoint_of_study' mod1 = smf.glm(formula=formula, data=A2, family=sm.families.Binomial()).fit() mod1.summary() After that I am trying to do the ANOVA test for this model using the anova in statsmodels.stats table1

Anova test for GLM in python

偶尔善良 提交于 2021-01-19 04:14:22
问题 I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM. formula = 'PropNo_Pred ~ Geography + log10BMI + Cat_OpCavity + CatLes_neles + CatRural_urban + \ CatPred_Control + CatNative_Intro + Midpoint_of_study' mod1 = smf.glm(formula=formula, data=A2, family=sm.families.Binomial()).fit() mod1.summary() After that I am trying to do the ANOVA test for this model using the anova in statsmodels.stats table1

python关于时间序列的分析

蹲街弑〆低调 提交于 2021-01-13 19:05:30
1, pandas生成时间一般采用date_range操作,这个之前的博客已经详细的讲解过,这里就不在阐述 2, pandas的数据重采样 什么是数据 重采样 ? 就好比原来一堆统计数据是按照天来进行统计的,持续一年; 那我们能不能看月整体变化的程度呢? 那这个时候就涉及到数据的重采样问题,按照上述的例子:由天变为月,那这个就是一个降采样的过程,那既然有降采样,那必定也有升采样。 那如何使用pandas完成将采样和升采样呢? rng = pd.date_range( ' 1/1/2011 ' ,periods=90, freq= ' D ' ) ts = pd.Series(np.random.randn(len(rng)), index= rng) ts.head() # 降采样 ts.resample( ' M ' ).sum() # pandas使用resample方法来进行重采样,统计的指标的sum,当然可以是mean ts.resample( ' 3D ' ).sum() # 当然也可以指定是一个周期 # 升采样 day3d.resample( ' D ' ).safreq()) # 我们会发现,部分有了空值NAN但是出现NAN会影响我们的统计所以 # ffill 空值取前面的值 # bfill 空值取后面的值 # interpolate 线性取值 day3d

How to update to the developer version of statsmodels using Conda?

泪湿孤枕 提交于 2021-01-03 03:43:14
问题 I am currently trying to update my statsmodels package in Conda to the developer version statsmodels v0.11.0dev0. As I am relatively new to Python, I am struggling heavily to understand different threads on how to update to the developer version. On https://www.statsmodels.org/dev/install.html a short hint on how to install the developer version is given, nevertheless I cannot follow. I have tried the pip install -e and python setup.py develop. In order to specifically update the statsmodel

How to update to the developer version of statsmodels using Conda?

青春壹個敷衍的年華 提交于 2021-01-03 03:40:28
问题 I am currently trying to update my statsmodels package in Conda to the developer version statsmodels v0.11.0dev0. As I am relatively new to Python, I am struggling heavily to understand different threads on how to update to the developer version. On https://www.statsmodels.org/dev/install.html a short hint on how to install the developer version is given, nevertheless I cannot follow. I have tried the pip install -e and python setup.py develop. In order to specifically update the statsmodel

python时间序列分析

大城市里の小女人 提交于 2020-12-31 04:37:42
转载自 最小森林-python时间序列分析 一、什么是时间序列 时间序列简单的说就是各时间点上形成的数值序列,时间序列分析就是通过观察历史数据预测未来的值。 在这里需要强调一点的是,时间序列分析并不是关于时间的回归,它主要是研究自身的变化规律的(这里不考虑含外生变量的时间序列)。 环境配置 python作为科学计算的利器,当然也有相关分析的包:statsmodels中tsa模块,当然这个包和SAS、R是比不了,但是python有另一个神器:pandas!pandas在时间序列上的应用,能简化我们很多的工作。这两个包pip就能安装。 数据准备 许多时间序列分析一样,本文同样使用航空乘客数据(AirPassengers.csv)作为样例。 下载链接 。 用pandas操作时间序列 # -*- coding:utf-8 -*- import numpy as np import pandas as pd from datetime import datetime import matplotlib.pylab as plt # 读取数据,pd.read_csv默认生成DataFrame对象,需将其转换成Series对象 df = pd.read_csv( ' AirPassengers.csv ' , encoding= ' utf-8 ' , index_col= ' Month '