statsmodels | 易学教程

Python statsmodels robust cov_type='hac-panel' issue

阅读更多关于 Python statsmodels robust cov_type='hac-panel' issue

问题 I hope this is the right place for my question. I would like to understand how to use the 'hac-panel' cov_type when running sm.OLS. I have struggled with it the whole day but still cannot figure it out. Here is an example of my code (with data): import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf from pandas.tseries.offsets import * # Just grabbing some random data here dat = sm.datasets.macrodata.load_pandas().data dat['time'] = dat['year'].apply(lambda x:

How to forecast time series using AutoReg

阅读更多关于 How to forecast time series using AutoReg

问题 I'm trying to build old school model using only auto regression algorithm. I found out that there's an implementation of it in statsmodel package. I've read the documentation, and as I understand it should work as ARIMA. So, here's my code: import statsmodels.api as sm model = sm.tsa.AutoReg(df_train.beer, 12).fit() And when I want to predict new values, I'm trying to follow the documentation: y_pred = model.predict(start=df_test.index.min(), end=df_test.index.max()) # or y_pred = model

Clustered standard errors in statsmodels with categorical variables (Python)

阅读更多关于 Clustered standard errors in statsmodels with categorical variables (Python)

问题 I want to run a regression in statsmodels that uses categorical variables and clustered standard errors. I have a dataset with columns institution, treatment, year, and enrollment. Treatment is a dummy, institution is a string, and the others are numbers. I've made sure to drop any null values. df.dropna() reg_model = smf.ols("enroll ~ treatment + C(year) + C(institution)", df) .fit(cov_type='cluster', cov_kwds={'groups': df['institution']}) I'm getting the following: ValueError: The weights

Anova test for GLM in python

阅读更多关于 Anova test for GLM in python

问题 I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM. formula = 'PropNo_Pred ~ Geography + log10BMI + Cat_OpCavity + CatLes_neles + CatRural_urban + \ CatPred_Control + CatNative_Intro + Midpoint_of_study' mod1 = smf.glm(formula=formula, data=A2, family=sm.families.Binomial()).fit() mod1.summary() After that I am trying to do the ANOVA test for this model using the anova in statsmodels.stats table1

Anova test for GLM in python

阅读更多关于 Anova test for GLM in python

Anova test for GLM in python

阅读更多关于 Anova test for GLM in python

python关于时间序列的分析

阅读更多关于 python关于时间序列的分析

1， pandas生成时间一般采用date_range操作，这个之前的博客已经详细的讲解过,这里就不在阐述 2， pandas的数据重采样什么是数据重采样？就好比原来一堆统计数据是按照天来进行统计的，持续一年；那我们能不能看月整体变化的程度呢？那这个时候就涉及到数据的重采样问题，按照上述的例子：由天变为月，那这个就是一个降采样的过程，那既然有降采样，那必定也有升采样。那如何使用pandas完成将采样和升采样呢？ rng = pd.date_range( ' 1/1/2011 ' ,periods=90, freq= ' D ' ) ts = pd.Series(np.random.randn(len(rng)), index= rng) ts.head() # 降采样 ts.resample( ' M ' ).sum() # pandas使用resample方法来进行重采样，统计的指标的sum，当然可以是mean ts.resample( ' 3D ' ).sum() # 当然也可以指定是一个周期 # 升采样 day3d.resample( ' D ' ).safreq()) # 我们会发现，部分有了空值NAN但是出现NAN会影响我们的统计所以 # ffill 空值取前面的值 # bfill 空值取后面的值 # interpolate 线性取值 day3d

How to update to the developer version of statsmodels using Conda?

阅读更多关于 How to update to the developer version of statsmodels using Conda?

问题 I am currently trying to update my statsmodels package in Conda to the developer version statsmodels v0.11.0dev0. As I am relatively new to Python, I am struggling heavily to understand different threads on how to update to the developer version. On https://www.statsmodels.org/dev/install.html a short hint on how to install the developer version is given, nevertheless I cannot follow. I have tried the pip install -e and python setup.py develop. In order to specifically update the statsmodel

How to update to the developer version of statsmodels using Conda?

阅读更多关于 How to update to the developer version of statsmodels using Conda?

python时间序列分析

阅读更多关于 python时间序列分析

转载自最小森林-python时间序列分析一、什么是时间序列时间序列简单的说就是各时间点上形成的数值序列，时间序列分析就是通过观察历史数据预测未来的值。在这里需要强调一点的是，时间序列分析并不是关于时间的回归，它主要是研究自身的变化规律的（这里不考虑含外生变量的时间序列）。环境配置 python作为科学计算的利器，当然也有相关分析的包:statsmodels中tsa模块，当然这个包和SAS、R是比不了，但是python有另一个神器：pandas！pandas在时间序列上的应用，能简化我们很多的工作。这两个包pip就能安装。数据准备许多时间序列分析一样，本文同样使用航空乘客数据（AirPassengers.csv）作为样例。下载链接。用pandas操作时间序列 # -*- coding:utf-8 -*- import numpy as np import pandas as pd from datetime import datetime import matplotlib.pylab as plt # 读取数据，pd.read_csv默认生成DataFrame对象，需将其转换成Series对象 df = pd.read_csv( ' AirPassengers.csv ' , encoding= ' utf-8 ' , index_col= ' Month '