regression | 易学教程

Difference between the interaction : and * term for formulas in StatsModels OLS regression

阅读更多关于 Difference between the interaction : and * term for formulas in StatsModels OLS regression

Hi I'm learning Statsmodel and can't figure out the difference between : and * (interaction terms) for formulas in StatsModels OLS regression. Could you please give me a hint to figure this out? Thank you! The documentation: http://statsmodels.sourceforge.net/devel/example_formulas.html Yaron ":" will give a regression without the level itself. just the interaction you have mentioned. "*" will give a regression with the level itself + the interaction you have mentioned. for example a . GLMmodel = glm("y ~ a: b" , data = df) you'll have only one independent variable which is the results of "a"

B Spline confusion

阅读更多关于 B Spline confusion

I realise that there are posts on the topic of B-Splines on this board but those have actually made me more confused so I thought someone might be able to help me. I have simulated data for x-values ranging from 0 to 1. I'd like to fit to my data a cubic spline ( degree = 3 ) with knots at 0, 0.1, 0.2, ... , 0.9, 1. I'd also like to use the B-Spline basis and OLS for parameter estimation (I'm not looking for penalised splines). I think I need the bs function from the spline package but I'm not quite sure and I also don't know what exactly to feed it. I'd also like to plot the resulting

Performing lm() and segmented() on multiple columns in R

阅读更多关于 Performing lm() and segmented() on multiple columns in R

问题 I am trying to perform lm() and segmented() in R using the same independent variable (x) and multiple dependent response variables (Curve1, Curve2, etc.) one by one. I wish to extract the estimated break point and model coefficients for each response variable. I include an example of my data below. x Curve1 Curve2 Curve3 1 -0.236422 98.8169 95.6828 101.7910 2 -0.198083 98.3260 95.4185 101.5170 3 -0.121406 97.3442 94.8899 100.9690 4 0.875399 84.5815 88.0176 93.8424 5 0.913738 84.1139 87.7533

Gaussian Process scikit-learn - Exception

阅读更多关于 Gaussian Process scikit-learn - Exception

I want to use Gaussian Processes to solve a regression task. My data is as follow : each X vector has a length of 37, and each Y vector has a length of 8. I'm using the sklearn package in Python but trying to use gaussian processes leads to an Exception : from sklearn import gaussian_process print "x :", x__ print "y :", y__ gp = gaussian_process.GaussianProcess(theta0=1e-2, thetaL=1e-4, thetaU=1e-1) gp.fit(x__, y__) x : [[ 136. 137. 137. 132. 130. 130. 132. 133. 134. 135. 135. 134. 134. 1139. 1019. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 70. 24. 55. 0. 9. 0. 0.] [ 136. 137. 137. 132. 130

Calculate confidence band of least-square fit

阅读更多关于 Calculate confidence band of least-square fit

问题 I got a question that I fight around for days with now. How do I calculate the (95%) confidence band of a fit? Fitting curves to data is the every day job of every physicist -- so I think this should be implemented somewhere -- but I can't find an implementation for this neither do I know how to do this mathematically. The only thing I found is seaborn that does a nice job for linear least-square. import numpy as np from matplotlib import pyplot as plt import seaborn as sns import pandas as

Panel data regression: Robust standard errors

阅读更多关于 Panel data regression: Robust standard errors

问题 my problem is this: I get NA where I should get some values in the computation of robust standard errors. I am trying to do a fixed effect panel regression with cluster-robust standard errors. For this, I follow Arai (2011) who on p. 3 follows Stock/ Watson (2006) (later published in Econometrica, for those who have access). I would like to correct the degrees of freedom by (M/(M-1)*(N-1)/(N-K) against downward bias as my number of clusters is finite and I have unbalanced data. Similar

How does plot.lm() determine outliers for residual vs fitted plot?

阅读更多关于 How does plot.lm() determine outliers for residual vs fitted plot?

How does plot.lm() determine what points are outliers (that is, what points to label) for residual vs fitted plot? The only thing I found in the documentation is this: Details sub.caption—by default the function call—is shown as a subtitle (under the x-axis title) on each plot when plots are on separate pages, or as a subtitle in the outer margin (if any) when there are multiple plots per page. The ‘Scale-Location’ plot, also called ‘Spread-Location’ or ‘S-L’ plot, takes the square root of the absolute residuals in order to diminish skewness (sqrt(|E|)) is much less skewed than | E | for

Neural Network Ordinal Classification for Age

阅读更多关于 Neural Network Ordinal Classification for Age

I have created a simple neural network (Python, Theano) to estimate a persons age based on their spending history from a selection of different stores. Unfortunately, it is not particularly accurate. The accuracy might be hurt by the fact that the network has no knowledge of ordinality. For the network there is no relationship between the age classifications. It is currently selecting the age with the highest probability from the softmax output layer. I have considered changing the output classification to an average of the weighted probability for each age. E.g Given age probabilities: (Age

loess predict with new x values

阅读更多关于 loess predict with new x values

问题 I am attempting to understand how the predict.loess function is able to compute new predicted values ( y_hat ) at points x that do not exist in the original data. For example (this is a simple example and I realize loess is obviously not needed for an example of this sort but it illustrates the point): x <- 1:10 y <- x^2 mdl <- loess(y ~ x) predict(mdl, 1.5) [1] 2.25 loess regression works by using polynomials at each x and thus it creates a predicted y_hat at each y . However, because there

Loss suddenly increases with Adam Optimizer in Tensorflow

阅读更多关于 Loss suddenly increases with Adam Optimizer in Tensorflow

I am using a CNN for a regression task. I use Tensorflow and the optimizer is Adam. The network seems to converge perfectly fine till one point where the loss suddenly increases along with the validation error. Here are the loss plots of the labels and the weights separated (Optimizer is run on the sum of them) I use l2 loss for weight regularization and also for the labels. I apply some randomness on the training data. I am currently trying RSMProp to see if the behavior changes but it takes at least 8h to reproduce the error. I would like to understand how this can happen. Hope you can help