regression | 易学教程

Java-R integration?

阅读更多关于 Java-R integration?

I have a Java app which needs to perform partial least squares regression. It would appear there are no Java implementations of PLSR out there. Weka might have had something like it at some point, but it is no longer in the API. On the other hand, I have found a good R implementation, which has an added bonus to it. It was used by the people whose result I want to replicate, which means there is less chance that things will go wrong because of differences in the way PLSR is implemented. The question is: is there a good enough (and simple to use) package that enable Java to call R, pass in some

Stepwise regression using p-values to drop variables with nonsignificant p-values

阅读更多关于 Stepwise regression using p-values to drop variables with nonsignificant p-values

I want to perform a stepwise linear Regression using p-values as a selection criterion, e.g.: at each step dropping variables that have the highest i.e. the most insignificant p-values, stopping when all values are significant defined by some threshold alpha . I am totally aware that I should use the AIC (e.g. command step or stepAIC ) or some other criterion instead, but my boss has no grasp of statistics and insist on using p-values. If necessary, I could program my own routine, but I am wondering if there is an already implemented version of this. Show your boss the following : set.seed(100

large-scale regression in R with a sparse feature matrix

阅读更多关于 large-scale regression in R with a sparse feature matrix

问题 I'd like to do large-scale regression (linear/logistic) in R with many (e.g. 100k) features, where each example is relatively sparse in the feature space---e.g., ~1k non-zero features per example. It seems like the SparseM package slm should do this, but I'm having difficulty converting from the sparseMatrix format to a slm -friendly format. I have a numeric vector of labels y and a sparseMatrix of features X \in {0,1}. When I try model <- slm(y ~ X) I get the following error: Error in model

lme4::lmer reports “fixed-effect model matrix is rank deficient”, do I need a fix and how to?

阅读更多关于 lme4::lmer reports “fixed-effect model matrix is rank deficient”, do I need a fix and how to?

I am trying to run a mixed-effects model that predicts F2_difference with the rest of the columns as predictors, but I get an error message that says fixed-effect model matrix is rank deficient so dropping 7 columns / coefficients. From this link, Fixed-effects model is rank deficient , I think I should use findLinearCombos in the R package caret . However, when I try findLinearCombos(data.df) , it gives me the error message Error in qr.default(object) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In qr.default(object) : NAs introduced by coercion My data does not

Why does lm run out of memory while matrix multiplication works fine for coefficients?

阅读更多关于 Why does lm run out of memory while matrix multiplication works fine for coefficients?

问题 I am trying to do fixed effects linear regression with R. My data looks like dte yr id v1 v2 . . . . . . . . . . . . . . . I then decided to simply do this by making yr a factor and use lm : lm(v1 ~ factor(yr) + v2 - 1, data = df) However, this seems to run out of memory. I have 20 levels in my factor and df is 14 million rows which takes about 2GB to store, I am running this on a machine with 22 GB dedicated to this process. I then decided to try things the old fashioned way: create dummy

Compute projection / hat matrix via QR factorization, SVD (and Cholesky factorization?)

阅读更多关于 Compute projection / hat matrix via QR factorization, SVD (and Cholesky factorization?)

问题 I'm trying to calculate in R a projection matrix P of an arbitrary N x J matrix S : P = S (S'S) ^ -1 S' I've been trying to perform this with the following function: P <- function(S){ output <- S %*% solve(t(S) %*% S) %*% t(S) return(output) } But when I use this I get errors that look like this: # Error in solve.default(t(S) %*% S, t(S), tol = 1e-07) : # system is computationally singular: reciprocal condition number = 2.26005e-28 I think that this is a result of numerical underflow and/or

How to return predicted values,residuals,R square from lm.fit in R?

阅读更多关于 How to return predicted values,residuals,R square from lm.fit in R?

问题 this piece of code will return coefficients :intercept , slop1 , slop2 set.seed(1) n=10 y=rnorm(n) x1=rnorm(n) x2=rnorm(n) lm.ft=function(y,x1,x2) return(lm(y~x1+x2)$coef) res=list(); for(i in 1:n){ x1.bar=x1-x1[i] x2.bar=x2-x2[i] res[[i]]=lm.ft(y,x1.bar,x2.bar) } If I type: > res[[1]] I get: (Intercept) x1 x2 -0.44803887 0.06398476 -0.62798646 How can we return predicted values,residuals,R square, ..etc? I need something general to extract whatever I need from the summary? 回答1: There are a

fitting data with numpy

阅读更多关于 fitting data with numpy

Let me start by telling that what I get may not be what I expect and perhaps you can help me here. I have the following data: >>> x array([ 3.08, 3.1 , 3.12, 3.14, 3.16, 3.18, 3.2 , 3.22, 3.24, 3.26, 3.28, 3.3 , 3.32, 3.34, 3.36, 3.38, 3.4 , 3.42, 3.44, 3.46, 3.48, 3.5 , 3.52, 3.54, 3.56, 3.58, 3.6 , 3.62, 3.64, 3.66, 3.68]) >>> y array([ 0.000857, 0.001182, 0.001619, 0.002113, 0.002702, 0.003351, 0.004062, 0.004754, 0.00546 , 0.006183, 0.006816, 0.007362, 0.007844, 0.008207, 0.008474, 0.008541, 0.008539, 0.008445, 0.008251, 0.007974, 0.007608, 0.007193, 0.006752, 0.006269, 0.005799, 0.005302,

get x-value given y-value: general root finding for linear / non-linear interpolation function

阅读更多关于 get x-value given y-value: general root finding for linear / non-linear interpolation function

I am interested in a general root finding problem for an interpolation function. Suppose I have the following (x, y) data: set.seed(0) x <- 1:10 + runif(10, -0.1, 0.1) y <- rnorm(10, 3, 1) as well as a linear interpolation and a cubic spline interpolation: f1 <- approxfun(x, y) f3 <- splinefun(x, y, method = "fmm") How can I find x -values where these interpolation functions cross a horizontal line y = y0 ? The following is a graphical illustration with y0 = 2.85 . par(mfrow = c(1, 2)) curve(f1, from = x[1], to = x[10]); abline(h = 2.85, lty = 2) curve(f3, from = x[1], to = x[10]); abline(h =

tensorflow deep neural network for regression always predict same results in one batch

阅读更多关于 tensorflow deep neural network for regression always predict same results in one batch

问题 I use a tensorflow to implement a simple multi-layer perceptron for regression. The code is modified from standard mnist classifier, that I only changed the output cost to MSE (use tf.reduce_mean(tf.square(pred-y)) ), and some input, output size settings. However, if I train the network using regression, after several epochs, the output batch are totally the same. for example: target: 48.129, estimated: 42.634 target: 46.590, estimated: 42.634 target: 34.209, estimated: 42.634 target: 69.677,