linear-regression

How to force R to use a specified factor level as reference in a regression?

徘徊边缘 提交于 2019-11-26 14:56:32
How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression? It's just using some level by default. lm(x ~ y + as.factor(b)) with b {0, 1, 2, 3, 4} . Let's say I want to use 3 instead of the zero that is used by R. See the relevel() function. Here is an example: set.seed(123) x <- rnorm(100) DF <- data.frame(x = x, y = 4 + (1.5*x) + rnorm(100, sd = 2), b = gl(5, 20)) head(DF) str(DF) m1 <- lm(y ~ x + b, data = DF) summary(m1) Now alter the factor b in DF by use of the relevel() function: DF <- within(DF, b <- relevel(b, ref = 3)) m2 <- lm(y ~ x +

Error in Confusion Matrix : the data and reference factors must have the same number of levels

余生长醉 提交于 2019-11-26 14:28:24
问题 I've trained a Linear Regression model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error: Error in confusionMatrix.default(pred, testing$Final) : the data and reference factors must have the same number of levels EnglishMarks <- read.csv("E:/Subject Wise Data/EnglishMarks.csv", header=TRUE) inTrain<-createDataPartition(y=EnglishMarks$Final,p=0.7,list=FALSE) training<-EnglishMarks[inTrain,] testing<-EnglishMarks[-inTrain,] predictionsTree <-

Why the built-in lm function is so slow in R?

…衆ロ難τιáo~ 提交于 2019-11-26 14:22:39
问题 I always thought that the lm function was extremely fast in R, but as this example would suggest, the closed solution computed using the solve function is way faster. data<-data.frame(y=rnorm(1000),x1=rnorm(1000),x2=rnorm(1000)) X = cbind(1,data$x1,data$x2) library(microbenchmark) microbenchmark( solve(t(X) %*% X, t(X) %*% data$y), lm(y ~ .,data=data)) Can someone explain me if this toy example is a bad example or it is the case that lm is actually slow? EDIT: As suggested by Dirk

Adding a regression line on a ggplot

跟風遠走 提交于 2019-11-26 12:07:50
问题 I\'m trying hard to add a regression line on a ggplot. I first tried with abline but I didn\'t manage to make it work. Then I tried this... data = data.frame(x.plot=rep(seq(1,5),10),y.plot=rnorm(50)) ggplot(data,aes(x.plot,y.plot))+stat_summary(fun.data=mean_cl_normal) + geom_smooth(method=\'lm\',formula=data$y.plot~data$x.plot) But it is not working either. 回答1: In general, to provide your own formula you should use arguments x and y that will correspond to values you provided in ggplot() -

Linear regression with matplotlib / numpy

有些话、适合烂在心里 提交于 2019-11-26 11:52:30
问题 I\'m trying to generate a linear regression on a scatter plot I have generated, however my data is in list format, and all of the examples I can find of using polyfit require using arange . arange doesn\'t accept lists though. I have searched high and low about how to convert a list to an array and nothing seems clear. Am I missing something? Following on, how best can I use my list of integers as inputs to the polyfit ? here is the polyfit example I am following: from pylab import * x =

How does predict.lm() compute confidence interval and prediction interval?

微笑、不失礼 提交于 2019-11-26 11:22:40
I ran a regression: CopierDataRegression <- lm(V1~V2, data=CopierData1) and my task was to obtain a 90% confidence interval for the mean response given V2=6 and 90% prediction interval when V2=6 . I used the following code: X6 <- data.frame(V2=6) predict(CopierDataRegression, X6, se.fit=TRUE, interval="confidence", level=0.90) predict(CopierDataRegression, X6, se.fit=TRUE, interval="prediction", level=0.90) and I got (87.3, 91.9) and (74.5, 104.8) which seems to be correct since the PI should be wider. The output for both also included se.fit = 1.39 which was the same. I don't understand what

lme4::lmer reports “fixed-effect model matrix is rank deficient”, do I need a fix and how to?

拜拜、爱过 提交于 2019-11-26 10:59:35
问题 I am trying to run a mixed-effects model that predicts F2_difference with the rest of the columns as predictors, but I get an error message that says fixed-effect model matrix is rank deficient so dropping 7 columns / coefficients. From this link, Fixed-effects model is rank deficient, I think I should use findLinearCombos in the R package caret . However, when I try findLinearCombos(data.df) , it gives me the error message Error in qr.default(object) : NA/NaN/Inf in foreign function call

Accuracy Score ValueError: Can&#39;t Handle mix of binary and continuous target

左心房为你撑大大i 提交于 2019-11-26 08:59:47
问题 I\'m using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it\'s perfect. I have a problem to evaluate the predicted results using the accuracy_score metric. This is my true Data : array([1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0]) My predicted Data: array([ 0.07094605, 0.1994941 , 0.19270157, 0.13379635, 0.04654469, 0.09212494, 0.19952108, 0.12884365, 0.15685076, -0.01274453, 0.32167554, 0.32167554, -0.10023553, 0.09819648, -0.06755516, 0.25390082,

scikit-learn & statsmodels - which R-squared is correct?

霸气de小男生 提交于 2019-11-26 08:29:10
问题 I\'d like to choose the best algorithm for future. I found some solutions, but I didn\'t understand which R-Squared value is correct. For this, I divided my data into two as test and training, and I printed two different R squared values ​​below. import statsmodels.api as sm from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score lineer = LinearRegression() lineer.fit(x_train,y_train) lineerPredict = lineer.predict(x_test) scoreLineer = r2_score(y_test,

predict.lm() with an unknown factor level in test data

限于喜欢 提交于 2019-11-26 08:08:14
问题 I am fitting a model to factor data and predicting. If the newdata in predict.lm() contains a single factor level that is unknown to the model, all of predict.lm() fails and returns an error. Is there a good way to have predict.lm() return a prediction for those factor levels the model knows and NA for unknown factor levels, instead of only an error? Example code: foo <- data.frame(response=rnorm(3),predictor=as.factor(c(\"A\",\"B\",\"C\"))) model <- lm(response~predictor,foo) foo.new <- data