statsmodels | 易学教程

Print OLS regression summary to text file

阅读更多关于 Print OLS regression summary to text file

问题 I am running OLS regression using pandas.stats.api.ols using a groupby with the following code: from pandas.stats.api import ols df=pd.read_csv(r'F:\file.csv') result=df.groupby(['FID']).apply(lambda d: ols(y=d.loc[:, 'MEAN'], x=d.loc[:, ['Accum_Prcp', 'Accum_HDD']])) for i in result: x=pd.DataFrame({'FID':i.index, 'delete':i.values}) frame = pd.concat([x,DataFrame(x['delete'].tolist())], axis=1, join='outer') del frame['delete'] print frame but this returns the error: AttributeError: 'OLS'

Compute a confidence interval from sample data assuming unknown distribution

阅读更多关于 Compute a confidence interval from sample data assuming unknown distribution

问题 I have sample data which I would like to compute a confidence interval for, assuming a distribution is not normal and is unknown. Basically, it looks like distribution is Pareto but I don't know for sure. The answers for the normal distribution: Compute a confidence interval from sample data Correct way to obtain confidence interval with scipy 回答1: If you don't know the underlying distribution, then my first thought would be to use bootstrapping: https://en.wikipedia.org/wiki/Bootstrapping_

Compute a confidence interval from sample data assuming unknown distribution

阅读更多关于 Compute a confidence interval from sample data assuming unknown distribution

Statsmodels with partly identified model

阅读更多关于 Statsmodels with partly identified model

问题 I am trying to run a regression where only some of the coefficients can be identified: data = np.array([[2, 1, 1, 1], [1, 1, 1, 0]]) df = pd.DataFrame(data, columns=['y', 'x1', 'x2', 'x3']) z = df.pop('y') mod = sm.OLS(z, sm.add_constant(df)) Now, I have two outcomes, and the only variables that changes between the two observations is x3 . So, I would expect that (since I added a constant), the model would be unable to identify x1 or x2 , and would omit those. It should however give me a 1

statsmodels: printing summary of more than one regression models together

阅读更多关于 statsmodels: printing summary of more than one regression models together

问题 In the Python library Statsmodels , you can print out the regression results with print(results.summary()) , how can I print out the summary of more than one regressions in one table, for better comparison? A linear regression, code taken from statsmodels documentation: nsample = 100 x = np.linspace(0, 10, 100) X = np.column_stack((x, x**2)) beta = np.array([0.1, 10]) e = np.random.normal(size=nsample) y = np.dot(X, beta) + e model = sm.OLS(y, X) results_noconstant = model.fit() Then I add a

statsmodels: printing summary of more than one regression models together

阅读更多关于 statsmodels: printing summary of more than one regression models together

ARMA.predict for out-of sample forecast does not work with floating points?

阅读更多关于 ARMA.predict for out-of sample forecast does not work with floating points?

问题 After i developed my little ARMAX-forecasting model for in-sample analysis i´d like to predict some data out of sample. The time series i use for forecasting calculation starts at 2013-01-01 and ends at 2013-12-31! Here is my data I am working with: hr = np.loadtxt("Data_2013_17.txt") index = date_range(start='2013-1-1', end='2013-12-31', freq='D') df = pd.DataFrame(hr, index=index) holidays = ['2013-1-1', '2013-3-29', '2013-4-1', '2013-5-1', '2013-5-9', '2013-5-20', '2013-10-3', '2013-12-25'

Plotly: How to find coefficient of trendline in plotly express?

阅读更多关于 Plotly: How to find coefficient of trendline in plotly express?

问题 How do you find the coefficient of the trend line in plotly express? For example I used the code below to chart the trend line but now I want to know the coefficient. import plotly.express as px px.scatter(df, x='x_data', y='y_data', trendline="ols") 回答1: Here you need to have a look at plotly doc in plotly and statsmodels one. I think the example in the plotly example should be fixed. Anyway import plotly.express as px df = px.data.tips() fig = px.scatter(df, x="total_bill", y="tip",

Kolmogorov Smirnov test for the fitting goodness in python

阅读更多关于 Kolmogorov Smirnov test for the fitting goodness in python

问题 i am trying to fit distributions. The fitting is finished, but i need a measurement, to choose the best model. Many papers are using the Kolomogorov-Smirnov (KS) test. I tried to implement that, and i am getting very low p-value results. The implementation: #Histigram plot binwidth = np.arange(0,int(out_threshold1),1) n1, bins1, patches = plt.hist(h1, bins=binwidth, normed=1, facecolor='#023d6b', alpha=0.5, histtype='bar') #Fitting gevfit4 = gev.fit(h1) pdf_gev4 = gev.pdf(lnspc, *gevfit4) plt

How does statsmodels encode endog variables entered as strings?

阅读更多关于 How does statsmodels encode endog variables entered as strings?

问题 I'm new to using statsmodels to do statistical analyses. I'm getting expected answers most of the time but there are some things I don't quite understand about the way that statsmodels defines endog (dependant) variables for logistic regression when entered as strings. An example Pandas dataframe to illustrate the issue can be defined as shown below. The yN, yA and yA2 columns represent different ways to define an endog variable: yN is a binary variable coded 0, 1; yA is a binary variable