regression | 易学教程

Get all models from leaps regsubsets

阅读更多关于 Get all models from leaps regsubsets

问题 I used regsubsets to search for models. Is it possible to automatically create all lm from the list of parameter selections? library(leaps) leaps<-regsubsets(y ~ x1 + x2 + x3, data, nbest=1, method="exhaustive") summary(leaps)$which (Intercept) x1 x2 x3 1 TRUE FALSE FALSE TRUE 2 TRUE FALSE TRUE TRUE 3 TRUE TRUE TRUE TRUE Now i would manually do model_1 <- lm(y ~ x3) and so on. How can this be automated to have them in a list? 回答1: I don't know why you want a list of all models. summary and

Generating predicted values for levels of factor variable

阅读更多关于 Generating predicted values for levels of factor variable

问题 I am regressing a number of factor variables on a continuous outcome variable using lm() . For example, fit<-lm(dv~factor(hour)+factor(weekday)+factor(month)+factor(year)+count, data=df) I would like to generate predicted values ( yhat ) for different levels of a factor variable while holding the other variables at their median or modal value. For example, how would I generate the yhat for different weekdays while holding other factors constant? 回答1: I may be able to assist based on @Roland's

Applying lm() and predict() to multiple columns in a data frame

阅读更多关于 Applying lm() and predict() to multiple columns in a data frame

问题 I have an example dataset below. train<-data.frame(x1 = c(4,5,6,4,3,5), x2 = c(4,2,4,0,5,4), x3 = c(1,1,1,0,0,1), x4 = c(1,0,1,1,0,0), x5 = c(0,0,0,1,1,1)) Suppose I want to create separate models for column x3 , x4 , x5 based on column x1 and x2 . For example lm1 <- lm(x3 ~ x1 + x2) lm2 <- lm(x4 ~ x1 + x2) lm3 <- lm(x5 ~ x1 + x2) I want to then take these models and apply them to a testing set using predict, and then create a matrix that has each model outcome as a column. test <- data.frame

Linear regression, finding slope in MySQL

阅读更多关于 Linear regression, finding slope in MySQL

I'm trying to find the slope of a dataset that has DATETIME as the x axis, and a number on the y axis. I've tried the a number of approaches, and nothing will match the slope of the line when I plug the data into Excel, it's off by multiple orders of magnitude. This is what I have right now, but it's giving me a slope of -1.13e-13 instead of -0.008 SELECT (SUM((x-xBar)*(y-yBar)))/(SUM((x-xBar))*SUM((x-xBar)))) as slope from (select unix_timestamp(date) as x, (select avg(unix_timestamp(date)) from datatable) as xBar, value as y, (select avg(value) from datatable) as yBar from datatable) as d;

How do regression models deal with the factor variables?

阅读更多关于 How do regression models deal with the factor variables?

Suppose I have a data with a factor and response variable. My questions: How linear regression and mixed effect models work with the factor variables? If I have a separate model for each level of the factor variable (m3 and m4) , how does that differ with models m1 and m2 ? Which one is the best model/approach? As an example I use Orthodont data in nlme package. library(nlme) data = Orthodont data2 <- subset(data, Sex=="Male") data3 <- subset(data, Sex=="Female") m1 <- lm (distance ~ age + Sex, data = Orthodont) m2 <- lme(distance ~ age , data = Orthodont, random = ~ 1|Sex) m3 <- lm(distance ~

Calculating a linear trend line for every row of a table in R

阅读更多关于 Calculating a linear trend line for every row of a table in R

问题 is it somehow possible to conduct a linear regression for every single row of a data frame without using a loop? The output (intercept + slope) of the trend line should be added to the original data frame as new columns. To make my intention more clearly, I have prepared a very small data example: day1 <- c(1,3,1) day2 <- c(2,2,1) day3 <- c(3,1,5) output.intercept <- c(0,4,-1.66667) output.slope <- c(1,-1,2) data <- data.frame(day1,day2,day3,output.intercept,output.slope) Input variables are

How to decimal-align regression coefficients in Latex table output in rmarkdown document

阅读更多关于 How to decimal-align regression coefficients in Latex table output in rmarkdown document

问题 In an rmarkdown document, I'm creating a Latex table of regression coefficients with standard errors to compare several regression models in a single table. I'd like to vertically align the coefficients for each model so that the decimal points of the coefficients line up vertically down a column. I'm using texreg to create the table. The coefficients aren't decimal-aligned by default (instead, each string is centered within its column) and I'm looking for a way to get the coefficents decimal

Statsmodels OLS with rolling window problem

阅读更多关于 Statsmodels OLS with rolling window problem

问题 I would like to do a regression with a rolling window, but I got only one parameter back after the regression: rolling_beta = sm.OLS(X2, X1, window_type='rolling', window=30).fit() rolling_beta.params The result: X1 5.715089 dtype: float64 What could be the problem? Thanks in advance, Roland 回答1: I think the problem is that the parameters window_type='rolling' and window=30 simply do not do anything. First I'll show you why, and at the end I'll provide a setup I've got lying around for linear

Correlation coefficient on gnuplot

阅读更多关于 Correlation coefficient on gnuplot

问题 I want to plot data using fit function : function f(x) = a+b*x**2 . After ploting i have this result: correlation matrix of the fit parameters: m n m 1.000 n -0.935 1.000 My question is : how can i found a correlation coefficient on gnuplot ? 回答1: If you're looking for a way to calculate the correlation coefficient as defined on this page, you are out of luck using gnuplot as explained in this Google Groups thread. There are lots of other tools for calculating correlation coefficients, e.g.

negative value for “mean_squared_error”

阅读更多关于 negative value for “mean_squared_error”

I am using scikit and using mean_squared_error as a scoring function for model evaluation in cross_val_score. rms_score = cross_validation.cross_val_score(model, X, y, cv=20, scoring='mean_squared_error') I am using mean_squared_error as it is a regression problem and the estimators (model) used are lasso , ridge and elasticNet . For all these estimators, I am getting rms_score as negative values. How is it possible, given the fact that the differences in y values are squared. You get the mean_squared_error with sign flipped returned by cross_validation.cross_val_score. There is an issued