linear-regression | 易学教程

Get all models from leaps regsubsets

阅读更多关于 Get all models from leaps regsubsets

I used regsubsets to search for models. Is it possible to automatically create all lm from the list of parameter selections? library(leaps) leaps<-regsubsets(y ~ x1 + x2 + x3, data, nbest=1, method="exhaustive") summary(leaps)$which (Intercept) x1 x2 x3 1 TRUE FALSE FALSE TRUE 2 TRUE FALSE TRUE TRUE 3 TRUE TRUE TRUE TRUE Now i would manually do model_1 <- lm(y ~ x3) and so on. How can this be automated to have them in a list? I don't know why you want a list of all models. summary and coef methods should serve you well. But I will first answer your question from a pure programming aspect, then

support vector machines - a simple explanation?

阅读更多关于 support vector machines - a simple explanation?

问题 So, i'm trying to understand how the SVM algorithm works but i just cannot figure out how you transform some datasets in points of n-dimensional plane that would have a mathematical meaning in order to separate the points through a hyperplane and clasify them. There's an example here, they are trying to clasify pictures of tigers and elephants, they say "We digitize them into 100x100 pixel images, so we have x in n-dimensional plane, where n=10,000", but my question is how do they transform

In the LinearRegression method in sklearn, what exactly is the fit_intercept parameter doing?

阅读更多关于 In the LinearRegression method in sklearn, what exactly is the fit_intercept parameter doing?

问题 In the sklearn.linear_model.LinearRegression method, there is a parameter that is fit_intercept = TRUE or fit_intercept = FALSE . I am wondering if we set it to TRUE, does it add an additional intercept column of all 1's to your dataset? If I already have a dataset with a column of 1's, does fit_intercept = FALSE account for that or does it force it to fit a zero intercept model? Update: It seems people do not get my question. The question is basically, what IF I had already a column of 1's

Pandas with Fixed Effects

阅读更多关于 Pandas with Fixed Effects

I'm using Pandas on Python 2.7. I have data with the following columns: State, Year, UnempRate, Wage I'm teaching a course on how to use Python for research. As the culmination of our project, I want to run a regression of UnempRate on Wage controlling for State and Year fixed effects. I can do this with creation of dummies for states and year and then: ols(y=df['UnempRate'],x=df[FullDummyList]) Is there an easier way to do this? I was trying to use the PanelOLS method mentioned here: Fixed effect in Pandas or Statsmodels But I can't seem to get the syntax right, or find more documentation on

fit to time series using Gnuplot

阅读更多关于 fit to time series using Gnuplot

I am a big fan of Gnuplot and now I would like to use the fit-function for time series. My data set is like: 1.000000 1.000000 0.999795 0.000000 0.000000 0.421927 0.654222 -25.127700 1.000000 1994-08-12 1.000000 2.000000 0.046723 -0.227587 -0.689491 0.328387 1.000000 0.000000 1.000000 1994-08-12 2.000000 1.000000 0.945762 0.000000 0.000000 0.400038 0.582360 -8.624480 1.000000 1995-04-19 2.000000 2.000000 0.060228 -0.056367 -0.680224 0.551019 1.000000 0.000000 1.000000 1995-04-19 3.000000 1.000000 1.016430 0.000000 0.000000 0.574478 0.489638 -3.286880 1.000000 1995-07-15 And my fitting script:

How do you remove an insignificant factor level from a regression using the lm() function in R?

阅读更多关于 How do you remove an insignificant factor level from a regression using the lm() function in R?

When I perform a regression in R and use type factor it helps me avoid setting up the categorical variables in the data. But how do I remove a factor that is not significant from the regression to just show significant variables? For example: dependent <- c(1:10) independent1 <- as.factor(c('d','a','a','a','a','a','a','b','b','c')) independent2 <- c(-0.71,0.30,1.32,0.30,2.78,0.85,-0.25,-1.08,-0.94,1.33) output <- lm(dependent ~ independent1+independent2) summary(output) Which results in the following regression model: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.6180 1.0398

Subsetting in dredge (MuMIn) - must include interaction if main effects are present

阅读更多关于 Subsetting in dredge (MuMIn) - must include interaction if main effects are present

I'm doing some exploratory work where I use dredge{MuMIn}. In this procedure there are two variables that I want to set to be allowed together ONLY when the interaction between them is present, i.e. they can not be present together only as main effects. Using sample data: I want to dredge the model fm1 (disregarding that it probably doesn't make sense). If the variables GNP and Population appear together, they must also include the interaction between them. require(stats); require(graphics) ## give the data set in the form it is used in S-PLUS: longley.x <- data.matrix(longley[, 1:6]) longley

Getting p-value for linear regression in C gsl_fit_linear() function from GSL library

阅读更多关于 Getting p-value for linear regression in C gsl_fit_linear() function from GSL library

I'm trying to reporduce some code from R in C, so I'm trying to fit a linear regression using the gsl_fit_linear() function. In R I'd use the lm() function, which returns a p-value for the fit using this code: lmAvgs<- lm( c(1.23, 11.432, 14.653, 21.6534) ~ c(1970, 1980, 1990, 2000) ) summary(lmAvgs) I've no idea though how to go from the C output to a p-value, my code looks something like this so far: int main(void) { int i, n = 4; double x[4] = { 1970, 1980, 1990, 2000 }; double y[4] = {1.23, 11.432, 14.653, 21.6534}; double c0, c1, cov00, cov01, cov11, sumsq; gsl_fit_linear (x, 1, y, 1, n,

R-squared on test data

阅读更多关于 R-squared on test data

问题 I fit a linear regression model on 75% of my data set that includes ~11000 observations and 143 variables: gl.fit <- lm(y[1:ceiling(length(y)*(3/4))] ~ ., data= x[1:ceiling(length(y)*(3/4)),]) #3/4 for training , and I got an R^2 of 0.43. I then tried predicting on my test data using the rest of the data: ytest=y[(ceiling(length(y)*(3/4))+1):length(y)] x.test <- cbind(1,x[(ceiling(length(y)*(3/4))+1):length(y),]) #The rest for test yhat <- as.matrix(x.test)%*%gl.fit$coefficients #Calculate

MM robust estimation in ggplot2 using stat_smooth with method = “rlm”

阅读更多关于 MM robust estimation in ggplot2 using stat_smooth with method = “rlm”

The function rlm (MASS) permits both M and MM estimation for robust regression. I would like to plot the smoother from MM robust regression in ggplot2, however I think that when selecting method = "rlm" in stat_smooth, the estimation method automatically chosen is the M type. Is there any way of selecting the MM type estimation technique for the rlm function through ggplot2? Here is my code: df <- data.frame("x"=c(119,118,144,127,78.8,98.4,108,50,74,30.4, 50,72,99,155,113,144,102,131,105,127,120,85,153,40.6,133), "y"=c(1.56,2.17,0.81,1.07,1.12,2.03,0.90,1.48,0.64, 0.91,0.85,0.41,0.55,2.18,1.49