regression

How to do a GLM when “contrasts can be applied only to factors with 2 or more levels”?

心已入冬 提交于 2019-11-27 08:22:23
问题 I want to do a regression in R using glm , but is there a way to do it since I get the contrasts error. mydf <- data.frame(Group=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12), WL=rep(c(1,0),12), New.Runner=c("N","N","N","N","N","N","Y","N","N","N","N","N","N","Y","N","N","N","Y","N","N","N","N","N","Y"), Last.Run=c(1,5,2,6,5,4,NA,3,7,2,4,9,8,NA,3,5,1,NA,6,10,7,9,2,NA)) mod <- glm(formula = WL~New.Runner+Last.Run, family = binomial, data = mydf) #Error in `contrasts<-`(`*tmp*`,

R - Calculate Test MSE given a trained model from a training set and a test set

谁说我不能喝 提交于 2019-11-27 07:55:23
问题 Given two simple sets of data: head(training_set) x y 1 1 2.167512 2 2 4.684017 3 3 3.702477 4 4 9.417312 5 5 9.424831 6 6 13.090983 head(test_set) x y 1 1 2.068663 2 2 4.162103 3 3 5.080583 4 4 8.366680 5 5 8.344651 I want to fit a linear regression line on the training data, and use that line (or the coefficients) to calculate the "test MSE" or Mean Squared Error of the Residuals on the test data once that line is fit there. model = lm(y~x,data=training_set) train_MSE = mean(model$residuals

What does the capital letter “I” in R linear regression formula mean?

扶醉桌前 提交于 2019-11-27 07:28:23
I haven't been able to find an answer to this question, largely because googling anything with a standalone letter (like "I") causes issues. What does the "I" do in a model like this? data(rock) lm(area~I(peri - mean(peri)), data = rock) Considering that the following does NOT work: lm(area ~ (peri - mean(peri)), data = rock) and that this does work: rock$peri - mean(rock$peri) Any key words on how to research this myself would also be very helpful. I isolates or insulates the contents of I( ... ) from the gaze of R's formula parsing code. It allows the standard R operators to work as they

Working with neuralnet in R for the first time: get “requires numeric/complex matrix/vector arguments”

北城余情 提交于 2019-11-27 07:26:50
I'm in the process of attempting to learn to work with neural networks in R. As a learning problem, I've been using the following problem over at Kaggle : Don't worry, this problem is specifically designed for people to learn with, there's no reward tied to it. I started with a simple logistic regression, which was great for getting my feet wet. Now I'd like to learn to work with neural networks. My training data looks like this (Column:Row): - survived: 1 - pclass: 3 - sex: male - age: 22.0 - sibsp: 1 - parch: 0 - ticket: PC 17601 - fare: 7.25 - cabin: C85 - embarked: S My starting R code

Highcharts - Get crossing point of crossing series

青春壹個敷衍的年華 提交于 2019-11-27 07:17:13
问题 I am currently trying to extract the points of multiple crossings of series (a,b,c,d) of a specific series (x). I can't seem to find any function that can aid me in this task. My best bet is to measure the distance of every single point in x with every single point in a,b,c,d... and assume when the distance reaches under some threshold, the point must be a crossing point. I think this approach is far too computational heavy and seems "dirty". I believe there must be easier or better ways,

ggplot2: Problem with x axis when adding regression line equation on each facet

左心房为你撑大大i 提交于 2019-11-27 07:16:29
问题 Based on the example here Adding Regression Line Equation and R2 on graph, I am struggling to include the regression line equation for my model in each facet. However, I don't figure why is changing the limits of my x axis. library(ggplot2) library(reshape2) df <- data.frame(year = seq(1979,2010), M02 = runif(32,-4,6), M06 = runif(32, -2.4, 5.1), M07 = runif(32, -2, 7.1)) df <- melt(df, id = c("year")) ggplot(data = df, mapping = aes(x = year, y = value)) + geom_point() + scale_x_continuous()

scikit-learn cross validation, negative values with mean squared error

半腔热情 提交于 2019-11-27 06:58:19
When I use the following code with Data matrix X of size (952,144) and output vector y of size (952), mean_squared_error metric returns negative values, which is unexpected. Do you have any idea? from sklearn.svm import SVR from sklearn import cross_validation as CV reg = SVR(C=1., epsilon=0.1, kernel='rbf') scores = CV.cross_val_score(reg, X, y, cv=10, scoring='mean_squared_error') all values in scores are then negative. AN6U5 Trying to close this out, so am providing the answer that David and larsmans have eloquently described in the comments section: Yes, this is supposed to happen. The

Adding a regression line on a ggplot

浪尽此生 提交于 2019-11-27 06:51:19
I'm trying hard to add a regression line on a ggplot. I first tried with abline but I didn't manage to make it work. Then I tried this... data = data.frame(x.plot=rep(seq(1,5),10),y.plot=rnorm(50)) ggplot(data,aes(x.plot,y.plot))+stat_summary(fun.data=mean_cl_normal) + geom_smooth(method='lm',formula=data$y.plot~data$x.plot) But it is not working either. In general, to provide your own formula you should use arguments x and y that will correspond to values you provided in ggplot() - in this case x will be interpreted as x.plot and y as y.plot . More information about smoothing methods and

Combinations of variables that produce the smallest quantities in an R function

一曲冷凌霜 提交于 2019-11-27 06:31:11
问题 I'm interested in finding out which combination of variables ( binge followup sreport age ) in my model below produce smallest I2 statistic in rank order (smallest to largest). The I2 from each model is obtained like so: I2 <- function(x)as.double(x$mod_info$I.2) . Is there a way to automate this in R by looping over formulas? Ex: First fitting effectsize ~ binge , then effectsize ~ binge + followup then ... Note: suppose I have the names of all variables stored like so: var.names = c("binge"

linear model with `lm`: how to get prediction variance of sum of predicted values

有些话、适合烂在心里 提交于 2019-11-27 06:20:39
问题 I'm summing the predicted values from a linear model with multiple predictors, as in the example below, and want to calculate the combined variance, standard error and possibly confidence intervals for this sum. lm.tree <- lm(Volume ~ poly(Girth,2), data = trees) Suppose I have a set of Girths : newdat <- list(Girth = c(10,12,14,16) for which I want to predict the total Volume : pr <- predict(lm.tree, newdat, se.fit = TRUE) total <- sum(pr$fit) # [1] 111.512 How can I obtain the variance for