regression

Java-R integration?

﹥>﹥吖頭↗ 提交于 2019-11-27 06:20:34
I have a Java app which needs to perform partial least squares regression. It would appear there are no Java implementations of PLSR out there. Weka might have had something like it at some point, but it is no longer in the API. On the other hand, I have found a good R implementation, which has an added bonus to it. It was used by the people whose result I want to replicate, which means there is less chance that things will go wrong because of differences in the way PLSR is implemented. The question is: is there a good enough (and simple to use) package that enable Java to call R, pass in some

Stepwise regression using p-values to drop variables with nonsignificant p-values

我是研究僧i 提交于 2019-11-27 06:07:51
I want to perform a stepwise linear Regression using p-values as a selection criterion, e.g.: at each step dropping variables that have the highest i.e. the most insignificant p-values, stopping when all values are significant defined by some threshold alpha . I am totally aware that I should use the AIC (e.g. command step or stepAIC ) or some other criterion instead, but my boss has no grasp of statistics and insist on using p-values. If necessary, I could program my own routine, but I am wondering if there is an already implemented version of this. Show your boss the following : set.seed(100

large-scale regression in R with a sparse feature matrix

风格不统一 提交于 2019-11-27 05:30:17
问题 I'd like to do large-scale regression (linear/logistic) in R with many (e.g. 100k) features, where each example is relatively sparse in the feature space---e.g., ~1k non-zero features per example. It seems like the SparseM package slm should do this, but I'm having difficulty converting from the sparseMatrix format to a slm -friendly format. I have a numeric vector of labels y and a sparseMatrix of features X \in {0,1}. When I try model <- slm(y ~ X) I get the following error: Error in model

lme4::lmer reports “fixed-effect model matrix is rank deficient”, do I need a fix and how to?

∥☆過路亽.° 提交于 2019-11-27 04:06:18
I am trying to run a mixed-effects model that predicts F2_difference with the rest of the columns as predictors, but I get an error message that says fixed-effect model matrix is rank deficient so dropping 7 columns / coefficients. From this link, Fixed-effects model is rank deficient , I think I should use findLinearCombos in the R package caret . However, when I try findLinearCombos(data.df) , it gives me the error message Error in qr.default(object) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In qr.default(object) : NAs introduced by coercion My data does not

Why does lm run out of memory while matrix multiplication works fine for coefficients?

让人想犯罪 __ 提交于 2019-11-27 03:20:28
问题 I am trying to do fixed effects linear regression with R. My data looks like dte yr id v1 v2 . . . . . . . . . . . . . . . I then decided to simply do this by making yr a factor and use lm : lm(v1 ~ factor(yr) + v2 - 1, data = df) However, this seems to run out of memory. I have 20 levels in my factor and df is 14 million rows which takes about 2GB to store, I am running this on a machine with 22 GB dedicated to this process. I then decided to try things the old fashioned way: create dummy

Compute projection / hat matrix via QR factorization, SVD (and Cholesky factorization?)

若如初见. 提交于 2019-11-27 02:48:49
问题 I'm trying to calculate in R a projection matrix P of an arbitrary N x J matrix S : P = S (S'S) ^ -1 S' I've been trying to perform this with the following function: P <- function(S){ output <- S %*% solve(t(S) %*% S) %*% t(S) return(output) } But when I use this I get errors that look like this: # Error in solve.default(t(S) %*% S, t(S), tol = 1e-07) : # system is computationally singular: reciprocal condition number = 2.26005e-28 I think that this is a result of numerical underflow and/or

How to return predicted values,residuals,R square from lm.fit in R?

不想你离开。 提交于 2019-11-27 02:23:34
问题 this piece of code will return coefficients :intercept , slop1 , slop2 set.seed(1) n=10 y=rnorm(n) x1=rnorm(n) x2=rnorm(n) lm.ft=function(y,x1,x2) return(lm(y~x1+x2)$coef) res=list(); for(i in 1:n){ x1.bar=x1-x1[i] x2.bar=x2-x2[i] res[[i]]=lm.ft(y,x1.bar,x2.bar) } If I type: > res[[1]] I get: (Intercept) x1 x2 -0.44803887 0.06398476 -0.62798646 How can we return predicted values,residuals,R square, ..etc? I need something general to extract whatever I need from the summary? 回答1: There are a

fitting data with numpy

一曲冷凌霜 提交于 2019-11-27 00:27:36
Let me start by telling that what I get may not be what I expect and perhaps you can help me here. I have the following data: >>> x array([ 3.08, 3.1 , 3.12, 3.14, 3.16, 3.18, 3.2 , 3.22, 3.24, 3.26, 3.28, 3.3 , 3.32, 3.34, 3.36, 3.38, 3.4 , 3.42, 3.44, 3.46, 3.48, 3.5 , 3.52, 3.54, 3.56, 3.58, 3.6 , 3.62, 3.64, 3.66, 3.68]) >>> y array([ 0.000857, 0.001182, 0.001619, 0.002113, 0.002702, 0.003351, 0.004062, 0.004754, 0.00546 , 0.006183, 0.006816, 0.007362, 0.007844, 0.008207, 0.008474, 0.008541, 0.008539, 0.008445, 0.008251, 0.007974, 0.007608, 0.007193, 0.006752, 0.006269, 0.005799, 0.005302,

get x-value given y-value: general root finding for linear / non-linear interpolation function

回眸只為那壹抹淺笑 提交于 2019-11-26 23:04:19
I am interested in a general root finding problem for an interpolation function. Suppose I have the following (x, y) data: set.seed(0) x <- 1:10 + runif(10, -0.1, 0.1) y <- rnorm(10, 3, 1) as well as a linear interpolation and a cubic spline interpolation: f1 <- approxfun(x, y) f3 <- splinefun(x, y, method = "fmm") How can I find x -values where these interpolation functions cross a horizontal line y = y0 ? The following is a graphical illustration with y0 = 2.85 . par(mfrow = c(1, 2)) curve(f1, from = x[1], to = x[10]); abline(h = 2.85, lty = 2) curve(f3, from = x[1], to = x[10]); abline(h =

tensorflow deep neural network for regression always predict same results in one batch

☆樱花仙子☆ 提交于 2019-11-26 22:33:11
问题 I use a tensorflow to implement a simple multi-layer perceptron for regression. The code is modified from standard mnist classifier, that I only changed the output cost to MSE (use tf.reduce_mean(tf.square(pred-y)) ), and some input, output size settings. However, if I train the network using regression, after several epochs, the output batch are totally the same. for example: target: 48.129, estimated: 42.634 target: 46.590, estimated: 42.634 target: 34.209, estimated: 42.634 target: 69.677,