statsmodels | 易学教程

ANOVA in python using pandas dataframe with statsmodels or scipy?

阅读更多关于 ANOVA in python using pandas dataframe with statsmodels or scipy?

问题 I want to use the Pandas dataframe to breakdown the variance in one variable. For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I want to find out what fraction of the variation in this series is coming from cross-sectional city variation, how much is coming from time series variation, and how much is coming from night vs. day. In Stata I would use Fixed effects and look at the R^2. Hopefully my question makes sense.

How to perform a chi-squared goodness of fit test using scientific libraries in Python?

阅读更多关于 How to perform a chi-squared goodness of fit test using scientific libraries in Python?

问题 Let's assume I have some data I obtained empirically: from scipy import stats size = 10000 x = 10 * stats.expon.rvs(size=size) + 0.2 * np.random.uniform(size=size) It is exponentially distributed (with some noise) and I want to verify this using a chi-squared goodness of fit (GoF) test. What is the simplest way of doing this using the standard scientific libraries in Python (e.g. scipy or statsmodels) with the least amount of manual steps and assumptions? I can fit a model with: param = stats

Print 'std err' value from statsmodels OLS results

阅读更多关于 Print 'std err' value from statsmodels OLS results

(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can't access the docs) I'm doing a linear regression using statsmodels , basically: import statsmodels.api as sm model = sm.OLS(y,x) results = model.fit() I know that I can print out the full set of results with: print results.summary() which outputs something like: OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.952 Model: OLS Adj. R-squared: 0.951 Method: Least Squares F-statistic: 972.9 Date: Mon, 20 Jul 2015 Prob (F-statistic): 5

How to have multiple groups in Python statsmodels linear mixed effects model?

阅读更多关于 How to have multiple groups in Python statsmodels linear mixed effects model?

I am trying to use the Python statsmodels linear mixed effects model to fit a model that has two random intercepts, e.g. two groups. I cannot figure out how to initialize the model so that I can do this. Here's the example. I have data that looks like the following (taken from here ): subject gender scenario attitude frequency F1 F 1 pol 213.3 F1 F 1 inf 204.5 F1 F 2 pol 285.1 F1 F 2 inf 259.7 F1 F 3 pol 203.9 F1 F 3 inf 286.9 F1 F 4 pol 250.8 F1 F 4 inf 276.8 I want to make a linear mixed effects model with two random effects -- one for the subject group and one for the scenario group. I am

Calculate logistic regression in python

阅读更多关于 Calculate logistic regression in python

I tried to calculate logical regression. I have the data as csv file. it looks like node_id,second_major,gender,major_index,year,dorm,high_school,student_fac 0,0,2,257,2007,111,2849,1 1,0,2,271,2005,0,51195,2 2,0,2,269,2007,0,21462,1 3,269,1,245,2008,111,2597,1 .......................... This is my coding. import pandas as pd import statsmodels.api as sm import pylab as pl import numpy as np df = pd.read_csv("Reed98.csv") print df.describe() dummy_ranks = pd.get_dummies(df['second_major'], prefix='second_major') cols_to_keep = ['second_major', 'dorm', 'high_school'] data = df[cols_to_keep]

Plotting Historical Cointegration Values between two pairs

阅读更多关于 Plotting Historical Cointegration Values between two pairs

Here is the sample ADF test in python to check for Cointegration between two pairs. However the final result gives only the numeric value for co-integration. How to get the historical results of Co-integration. Taken from http://www.leinenbock.com/adf-test-in-python/ import numpy as np import statsmodels.api as stat import statsmodels.tsa.stattools as ts x = np.random.normal(0,1, 1000) y = np.random.normal(0,1, 1000) def cointegration_test(y, x): result = stat.OLS(y, x).fit() return ts.adfuller(result.resid) I assume you want to test for expanding cointegration? Note that you should use sm.tsa

What to use to do multiple correlation?

阅读更多关于 What to use to do multiple correlation?

I am trying to use python to compute multiple linear regression and multiple correlation between a response array and a set of arrays of predictors. I saw the very simple example to compute multiple linear regression, which is easy. But how to compute multiple correlation with statsmodels? or with anything else, as an alternative. I guess i could use rpy and R, but i'd prefer to stay in python if possible. edit [clarification]: Considering a situation like the one described here: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_MultivariableMethods/ I would like to compute also

Holt-Winters time series forecasting with statsmodels

阅读更多关于 Holt-Winters time series forecasting with statsmodels

I tried forecasting with holt-winters model as shown below but I keep getting a prediction that is not consistent with what I expect. I also showed a visualization of the plot Train = Airline[:130] Test = Airline[129:] from statsmodels.tsa.holtwinters import Holt y_hat_avg = Test.copy() fit1 = Holt(np.asarray(Train['Passengers'])).fit() y_hat_avg['Holt_Winter'] = fit1.predict(start=1,end=15) plt.figure(figsize=(16,8)) plt.plot(Train.index, Train['Passengers'], label='Train') plt.plot(Test.index,Test['Passengers'], label='Test') plt.plot(y_hat_avg.index,y_hat_avg['Holt_Winter'], label='Holt

Correct way to use ARMAResult.predict() function

阅读更多关于 Correct way to use ARMAResult.predict() function

According to this question How to get constant term in AR Model with statsmodels and Python? . I'm now trying to use the ARMA model to fit the data but again I couldn't find a way to interpret the model's result. Here what I have done according to ARMA out-of-sample prediction with statsmodels and ARMAResults.predict API document . # Parameter INPUT_DATA_POINT = 200 P = 5 Q = 0 # Read Data data = [] f = open('stock_all.csv', 'r') for line in f: data.append(float(line.split(',')[5])) f.close() # Fit ARMA-model using the first piece of data result = arma_model(data[:INPUT_DATA_POINT], P, Q) #

How to get the regression intercept using Statsmodels.api

阅读更多关于 How to get the regression intercept using Statsmodels.api

I am trying calculate a regression output using python library but I am unabl;e to get the intercept value when I use the library: import statsmodels.api as sm It prints all the regression analysis except the intercept. but when I use: from pandas.stats.api import ols My code for pandas: Regression = ols(y= Sorted_Data3['net_realization_rate'],x = Sorted_Data3[['Cohort_2','Cohort_3']]) print Regression I get the the intercept with a warning that this librabry will be deprecated in the future so I am trying to use Statsmodels. the warning that I get while using pandas.stats.api: Warning (from