statsmodels | 易学教程

Highest Posterior Density Region and Central Credible Region

阅读更多关于 Highest Posterior Density Region and Central Credible Region

问题 Given a posterior p(Θ|D) over some parameters Θ, one can define the following: Highest Posterior Density Region: The Highest Posterior Density Region is the set of most probable values of Θ that, in total, constitute 100(1-α) % of the posterior mass. In other words, for a given α, we look for a p * that satisfies: and then obtain the Highest Posterior Density Region as the set: Central Credible Region: Using the same notation as above, a Credible Region (or interval) is defined as: Depending

Python 2.7 - statsmodels - formatting and writing summary output

阅读更多关于 Python 2.7 - statsmodels - formatting and writing summary output

I'm doing logistic regression using pandas 0.11.0 (data handling) and statsmodels 0.4.3 to do the actual regression, on Mac OSX Lion. I'm going to be running ~2,900 different logistic regression models and need the results output to csv file and formatted in a particular way. Currently, I'm only aware of doing print result.summary() which prints the results (as follows) to the shell: Logit Regression Results ============================================================================== Dep. Variable: death_death No. Observations: 9752 Model: Logit Df Residuals: 9747 Method: MLE Df Model: 4

How to iterate over columns of pandas dataframe to run regression

阅读更多关于 How to iterate over columns of pandas dataframe to run regression

问题 I'm sure this is simple, but as a complete newbie to python, I'm having trouble figuring out how to iterate over variables in a pandas dataframe and run a regression with each. Here's what I'm doing: all_data = {} for ticker in ['FIUIX', 'FSAIX', 'FSAVX', 'FSTMX']: all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2010', '1/1/2015') prices = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()}) returns = prices.pct_change() I know I can run a regression like this: regs =

ANOVA in python using pandas dataframe with statsmodels or scipy?

阅读更多关于 ANOVA in python using pandas dataframe with statsmodels or scipy?

I want to use the Pandas dataframe to breakdown the variance in one variable. For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I want to find out what fraction of the variation in this series is coming from cross-sectional city variation, how much is coming from time series variation, and how much is coming from night vs. day. In Stata I would use Fixed effects and look at the R^2. Hopefully my question makes sense. Basically, what I want to do, is find the ANOVA breakdown of "Degrees" by three other columns. cphlewis I set

Highest Posterior Density Region and Central Credible Region

阅读更多关于 Highest Posterior Density Region and Central Credible Region

Given a posterior p(Θ|D) over some parameters Θ, one can define the following: Highest Posterior Density Region: The Highest Posterior Density Region is the set of most probable values of Θ that, in total, constitute 100(1-α) % of the posterior mass. In other words, for a given α, we look for a p * that satisfies: and then obtain the Highest Posterior Density Region as the set: Central Credible Region: Using the same notation as above, a Credible Region (or interval) is defined as: Depending on the distribution, there could be many such intervals. The central credible interval is defined as a

What's the difference between pandas ACF and statsmodel ACF?

阅读更多关于 What's the difference between pandas ACF and statsmodel ACF?

I'm calculating the Autocorrelation Function for a stock's returns. To do so I tested two functions, the autocorr function built into Pandas, and the acf function supplied by statsmodels.tsa . This is done in the following MWE: import pandas as pd from pandas_datareader import data import matplotlib.pyplot as plt import datetime from dateutil.relativedelta import relativedelta from statsmodels.tsa.stattools import acf, pacf ticker = 'AAPL' time_ago = datetime.datetime.today().date() - relativedelta(months = 6) ticker_data = data.get_data_yahoo(ticker, time_ago)['Adj Close'].pct_change().dropna

Why do I get only one parameter from a statsmodels OLS fit

阅读更多关于 Why do I get only one parameter from a statsmodels OLS fit

Here is what I am doing: $ python Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin >>> import statsmodels.api as sm >>> statsmodels.__version__ '0.5.0' >>> import numpy >>> y = numpy.array([1,2,3,4,5,6,7,8,9]) >>> X = numpy.array([1,1,2,2,3,3,4,4,5]) >>> res_ols = sm.OLS(y, X).fit() >>> res_ols.params array([ 1.82352941]) I had expected an array with two elements?!? The intercept and the slope coefficient? behzad.nouri Try this: X = sm.add_constant(X) sm.OLS(y,X) as in the documentation : An intercept is not included by default and

How to iterate over columns of pandas dataframe to run regression

阅读更多关于 How to iterate over columns of pandas dataframe to run regression

I'm sure this is simple, but as a complete newbie to python, I'm having trouble figuring out how to iterate over variables in a pandas dataframe and run a regression with each. Here's what I'm doing: all_data = {} for ticker in ['FIUIX', 'FSAIX', 'FSAVX', 'FSTMX']: all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2010', '1/1/2015') prices = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()}) returns = prices.pct_change() I know I can run a regression like this: regs = sm.OLS(returns.FIUIX,returns.FSTMX).fit() but suppose I want to do this for each column in the

How to get the prediction of test from 2D parameters of WLS regression in statsmodels

阅读更多关于 How to get the prediction of test from 2D parameters of WLS regression in statsmodels

I'm incrementally up the parameters of WLS regression functions using statsmodels. I have a 10x3 dataset X that I declared like this: X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]]) This is my dataset, and I have a 10x2 endog vector that looks like this: z = [[ 3.90311860e-322 2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0

ImportError: DLL load failed: when importing statsmodels [duplicate]

阅读更多关于 ImportError: DLL load failed: when importing statsmodels [duplicate]

问题 This question already has answers here : Installing scipy in Python 3.5 on 32-bit Windows 7 Machine (4 answers) Closed 4 years ago . My Python version is 3.5 on win32. I successfully installed Numpy+MKL, Scipy and Statsmodels from here http://www.lfd.uci.edu/~gohlke/pythonlibs/ However, when I run import statsmodels as sm I get the following error: Traceback (most recent call last): File "D:\Python\Innovation\try\Try_Reg.py", line 6, in <module> import statsmodels as sm File "C:\Python35\lib

订阅 statsmodels