linear-regression

ValueError: continuous-multioutput is not supported

断了今生、忘了曾经 提交于 2019-12-01 23:33:02
问题 I want to run several regression types (Lasso, Ridge, ElasticNet and SVR) on a dataset with around 5,000 rows and 6 features. Linear regression. Use GridSearchCV for cross validation. The code is extensive but here are some critical parts: def splitTrainTestAdv(df): y = df.iloc[:,-5:] # last 5 columns X = df.iloc[:,:-5] # Except for last 5 columns #Scaling and Sampling X = StandardScaler().fit_transform(X) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0

Coefficient table does not have NA rows in rank-deficient fit; how to insert them?

天大地大妈咪最大 提交于 2019-12-01 23:23:53
library(lmPerm) x <- lmp(formula = a ~ b * c + d + e, data = df, perm = "Prob") summary(x) # truncated output, I can see `NA` rows here! #Coefficients: (1 not defined because of singularities) # Estimate Iter Pr(Prob) #b 5.874 51 1.000 #c -30.060 281 0.263 #b:c NA NA NA #d1 -31.333 60 0.633 #d2 33.297 165 0.382 #d3 -19.096 51 1.000 #e 1.976 NA NA I want to pull out the Pr(Prob) results for everything, but y <- summary(x)$coef[, "Pr(Prob)"] #(Intercept) b c d1 d2 # 0.09459459 1.00000000 0.26334520 0.63333333 0.38181818 # d3 e # 1.00000000 NA This is not what I want. I need b:c row, too, in the

Why predicted polynomial changes drastically when only the resolution of prediction grid changes?

 ̄綄美尐妖づ 提交于 2019-12-01 22:24:45
Why I have the exact same model, but run predictions on different grid size (by 0.001 vs by 0.01) getting different predictions? set.seed(0) n_data=2000 x=runif(n_data)-0.5 y=0.1*sin(x*30)/x+runif(n_data) plot(x,y) poly_df=5 x_exp=as.data.frame(cbind(y,poly(x, poly_df))) fit=lm(y~.,data=x_exp) x_plt1=seq(-1,1,0.001) x_plt_exp1=as.data.frame(poly(x_plt1,poly_df)) lines(x_plt1,predict(fit,x_plt_exp1),lwd=3,col=2) x_plt2=seq(-1,1,0.01) x_plt_exp2=as.data.frame(poly(x_plt2,poly_df)) lines(x_plt2,predict(fit,x_plt_exp2),lwd=3,col=3) 李哲源 This is a coding / programming problem as on my quick run I

ValueError: continuous-multioutput is not supported

寵の児 提交于 2019-12-01 22:07:30
I want to run several regression types (Lasso, Ridge, ElasticNet and SVR) on a dataset with around 5,000 rows and 6 features. Linear regression. Use GridSearchCV for cross validation. The code is extensive but here are some critical parts: def splitTrainTestAdv(df): y = df.iloc[:,-5:] # last 5 columns X = df.iloc[:,:-5] # Except for last 5 columns #Scaling and Sampling X = StandardScaler().fit_transform(X) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0) return X_train, X_test, y_train, y_test def performSVR(x_train, y_train, X_test, parameter): C =

How to plot a linear regression to a double logarithmic R plot?

我只是一个虾纸丫 提交于 2019-12-01 21:55:13
I have the following data: someFactor = 500 x = c(1:250) y = x^-.25 * someFactor which I show in a double logarithmic plot: plot(x, y, log="xy") Now I "find out" the slope of the data using a linear model: model = lm(log(y) ~ log(x)) model which gives: Call: lm(formula = log(y) ~ log(x)) Coefficients: (Intercept) log(x) 6.215 -0.250 Now I'd like to plot the linear regression as a red line, but abline does not work: abline(model, col="red") What is the easiest way to add a regression line to my plot? lines(log(x), exp(predict(model, newdata=list(x=log(x)))) ,col="red") The range of values for x

Extract Formula From lm with Coefficients (R)

核能气质少年 提交于 2019-12-01 20:52:33
问题 I have an lm object and want to get the formula extracted with coefficients. I know how to extract the formula without coefficients, and how to get the coefficients without the formula, but not how to get eg. y ~ 10 + 1.25b as opposed to y~b or a table of what intercept, b etc. equal This is the code I'm working with currently: a = c(1, 2, 5) b = c(12, 15, 20) model = lm(a~b) summary(model) formula = formula(model) formula coefficients(model) What I'd like to get from the above is y ~ -5.326

Aggregate linear regression

天大地大妈咪最大 提交于 2019-12-01 20:26:37
Sorry I am quite new to R, but I have a dataframe with gamelogs for multiple players. I am trying to get the slope coefficient for each player's points over all of their games. I have seen that aggregate can use operators like sum and average , and getting coefficients off of a linear regression is pretty simple as well . How do I combine these? a <- c("player1","player1","player1","player2","player2","player2") b <- c(1,2,3,4,5,6) c <- c(15,12,13,4,15,9) gamelogs <- data.frame(name=a, game=b, pts=c) I want this to become: name pts slope player1 -.4286 player2 .08242 You can also do some magic

linear regression in R without copying data in memory?

雨燕双飞 提交于 2019-12-01 20:10:25
The standard way of doing a linear regression is something like this: l <- lm(Sepal.Width ~ Petal.Length + Petal.Width, data=iris) and then use predict(l, new_data) to make predictions, where new_data is a dataframe with columns matching the formula. But lm() returns an lm object, which is a list that contains crap-loads of stuff that is mostly irrelevant in most situations. This includes a copy of the original data, and a bunch of named vectors and arrays the length/size of the data: R> str(l) List of 12 $ coefficients : Named num [1:3] 3.587 -0.257 0.364 ..- attr(*, "names")= chr [1:3] "

Keras regression clip values

南楼画角 提交于 2019-12-01 19:07:16
I want to clip values, how could I do that? I tried using this: from keras.backend.tensorflow_backend import clip from keras.layers.core import Lambda ... model.add(Dense(1)) model.add(Activation('linear')) model.add(Lambda(lambda x: clip(x, min_value=200, max_value=1000))) But it does not matter where I put my Lambda+clip, it does not affect anything? It actually has to be implemented as loss, at the model.compile step. from keras import backend as K def clipped_mse(y_true, y_pred): return K.mean(K.square(K.clip(y_pred, 0., 1900.) - K.clip(y_true, 0., 1900.)), axis=-1) model.compile(loss

Calculating the number of dots lie above and below the regression line with R [closed]

南笙酒味 提交于 2019-12-01 17:39:11
How do I calculate the number of dots that lie above and below the regression line on a scatter plot? data = read.csv("info.csv") par(pty = "s") plot(data$col1, data$col2, xlab = "xaxis", ylab = "yaxis", xlim = c(0, 1), cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 1), col.lab = "red", col = "blue", pch = 19) abline(a = -1.21, b = 2.21) x <- 1:10 set.seed(1) y <- 2*x+rnorm(10) plot(y~x) fit <- lm(y~x) abline(fit) resi <- resid(fit) #below the fit: sum(resi < 0) #above the fit: sum(resi > 0) Edit: If you did (for some unknown reason) something like this: x <- 1:10 set.seed(1) y <- 2*x+rnorm(10)