regression | 易学教程

How to do a GLM when “contrasts can be applied only to factors with 2 or more levels”?

阅读更多关于 How to do a GLM when “contrasts can be applied only to factors with 2 or more levels”?

问题 I want to do a regression in R using glm , but is there a way to do it since I get the contrasts error. mydf <- data.frame(Group=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12), WL=rep(c(1,0),12), New.Runner=c("N","N","N","N","N","N","Y","N","N","N","N","N","N","Y","N","N","N","Y","N","N","N","N","N","Y"), Last.Run=c(1,5,2,6,5,4,NA,3,7,2,4,9,8,NA,3,5,1,NA,6,10,7,9,2,NA)) mod <- glm(formula = WL~New.Runner+Last.Run, family = binomial, data = mydf) #Error in `contrasts<-`(`*tmp*`,

R - Calculate Test MSE given a trained model from a training set and a test set

阅读更多关于 R - Calculate Test MSE given a trained model from a training set and a test set

问题 Given two simple sets of data: head(training_set) x y 1 1 2.167512 2 2 4.684017 3 3 3.702477 4 4 9.417312 5 5 9.424831 6 6 13.090983 head(test_set) x y 1 1 2.068663 2 2 4.162103 3 3 5.080583 4 4 8.366680 5 5 8.344651 I want to fit a linear regression line on the training data, and use that line (or the coefficients) to calculate the "test MSE" or Mean Squared Error of the Residuals on the test data once that line is fit there. model = lm(y~x,data=training_set) train_MSE = mean(model$residuals

What does the capital letter “I” in R linear regression formula mean?

阅读更多关于 What does the capital letter “I” in R linear regression formula mean?

I haven't been able to find an answer to this question, largely because googling anything with a standalone letter (like "I") causes issues. What does the "I" do in a model like this? data(rock) lm(area~I(peri - mean(peri)), data = rock) Considering that the following does NOT work: lm(area ~ (peri - mean(peri)), data = rock) and that this does work: rock$peri - mean(rock$peri) Any key words on how to research this myself would also be very helpful. I isolates or insulates the contents of I( ... ) from the gaze of R's formula parsing code. It allows the standard R operators to work as they

Working with neuralnet in R for the first time: get “requires numeric/complex matrix/vector arguments”

阅读更多关于 Working with neuralnet in R for the first time: get “requires numeric/complex matrix/vector arguments”

I'm in the process of attempting to learn to work with neural networks in R. As a learning problem, I've been using the following problem over at Kaggle : Don't worry, this problem is specifically designed for people to learn with, there's no reward tied to it. I started with a simple logistic regression, which was great for getting my feet wet. Now I'd like to learn to work with neural networks. My training data looks like this (Column:Row): - survived: 1 - pclass: 3 - sex: male - age: 22.0 - sibsp: 1 - parch: 0 - ticket: PC 17601 - fare: 7.25 - cabin: C85 - embarked: S My starting R code

Highcharts - Get crossing point of crossing series

阅读更多关于 Highcharts - Get crossing point of crossing series

问题 I am currently trying to extract the points of multiple crossings of series (a,b,c,d) of a specific series (x). I can't seem to find any function that can aid me in this task. My best bet is to measure the distance of every single point in x with every single point in a,b,c,d... and assume when the distance reaches under some threshold, the point must be a crossing point. I think this approach is far too computational heavy and seems "dirty". I believe there must be easier or better ways,

ggplot2: Problem with x axis when adding regression line equation on each facet

阅读更多关于 ggplot2: Problem with x axis when adding regression line equation on each facet

问题 Based on the example here Adding Regression Line Equation and R2 on graph, I am struggling to include the regression line equation for my model in each facet. However, I don't figure why is changing the limits of my x axis. library(ggplot2) library(reshape2) df <- data.frame(year = seq(1979,2010), M02 = runif(32,-4,6), M06 = runif(32, -2.4, 5.1), M07 = runif(32, -2, 7.1)) df <- melt(df, id = c("year")) ggplot(data = df, mapping = aes(x = year, y = value)) + geom_point() + scale_x_continuous()

scikit-learn cross validation, negative values with mean squared error

阅读更多关于 scikit-learn cross validation, negative values with mean squared error

When I use the following code with Data matrix X of size (952,144) and output vector y of size (952), mean_squared_error metric returns negative values, which is unexpected. Do you have any idea? from sklearn.svm import SVR from sklearn import cross_validation as CV reg = SVR(C=1., epsilon=0.1, kernel='rbf') scores = CV.cross_val_score(reg, X, y, cv=10, scoring='mean_squared_error') all values in scores are then negative. AN6U5 Trying to close this out, so am providing the answer that David and larsmans have eloquently described in the comments section: Yes, this is supposed to happen. The

Adding a regression line on a ggplot

阅读更多关于 Adding a regression line on a ggplot

I'm trying hard to add a regression line on a ggplot. I first tried with abline but I didn't manage to make it work. Then I tried this... data = data.frame(x.plot=rep(seq(1,5),10),y.plot=rnorm(50)) ggplot(data,aes(x.plot,y.plot))+stat_summary(fun.data=mean_cl_normal) + geom_smooth(method='lm',formula=data$y.plot~data$x.plot) But it is not working either. In general, to provide your own formula you should use arguments x and y that will correspond to values you provided in ggplot() - in this case x will be interpreted as x.plot and y as y.plot . More information about smoothing methods and

Combinations of variables that produce the smallest quantities in an R function

阅读更多关于 Combinations of variables that produce the smallest quantities in an R function

问题 I'm interested in finding out which combination of variables ( binge followup sreport age ) in my model below produce smallest I2 statistic in rank order (smallest to largest). The I2 from each model is obtained like so: I2 <- function(x)as.double(x$mod_info$I.2) . Is there a way to automate this in R by looping over formulas? Ex: First fitting effectsize ~ binge , then effectsize ~ binge + followup then ... Note: suppose I have the names of all variables stored like so: var.names = c("binge"

linear model with `lm`: how to get prediction variance of sum of predicted values

阅读更多关于 linear model with `lm`: how to get prediction variance of sum of predicted values

问题 I'm summing the predicted values from a linear model with multiple predictors, as in the example below, and want to calculate the combined variance, standard error and possibly confidence intervals for this sum. lm.tree <- lm(Volume ~ poly(Girth,2), data = trees) Suppose I have a set of Girths : newdat <- list(Girth = c(10,12,14,16) for which I want to predict the total Volume : pr <- predict(lm.tree, newdat, se.fit = TRUE) total <- sum(pr$fit) # [1] 111.512 How can I obtain the variance for