regression

efficient looping logistic regression in R

你离开我真会死。 提交于 2019-12-19 04:08:35
问题 I'm trying to run multiple logistic regression analyses for each of ~400k predictor variables. I would like to capture the outputs of each run into a row/column of an output table. My data organised in two parts. I have a 400000 x 189 double matrix ( mydatamatrix ) that contains the observations/data for each of my 400000 predictor variables measured in 189 individuals ( P1 ). I also have a second 189 x 20 data frame ( mydataframe ) containing the outcome variable and another predictor

Get hat matrix from QR decomposition for weighted least square regression

一笑奈何 提交于 2019-12-18 17:15:33
问题 I am trying to extend the lwr() function of the package McSptial , which fits weigthed regressions as non-parametric estimation. In the core of the lwr() function, it inverts a matrix using solve() instead of a QR decomposition, resulting in numerical instability. I would like to change it but can't figure out how to get the hat matrix (or other derivatives) from the QR decomposition afterward. With data : set.seed(0); xmat <- matrix(rnorm(500), nrow=50) ## model matrix y <- rowSums(rep(2:11

Get hat matrix from QR decomposition for weighted least square regression

纵然是瞬间 提交于 2019-12-18 17:15:22
问题 I am trying to extend the lwr() function of the package McSptial , which fits weigthed regressions as non-parametric estimation. In the core of the lwr() function, it inverts a matrix using solve() instead of a QR decomposition, resulting in numerical instability. I would like to change it but can't figure out how to get the hat matrix (or other derivatives) from the QR decomposition afterward. With data : set.seed(0); xmat <- matrix(rnorm(500), nrow=50) ## model matrix y <- rowSums(rep(2:11

mgcv: how to specify interaction between smooth and factor?

与世无争的帅哥 提交于 2019-12-18 17:03:29
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 2 years ago . In R, I would like to fit a gam model with categorical variables. I thought I could do it like with (cat is the categorical variable). lm(data = df, formula = y ~ x1*cat + x2 + x3); But I can't do things like : gam(data = df, formula = y ~ s(x1)*cat + s(x2) + x3) but the following works: gam(data = df, formula = y ~ cat + s(x1) + s(x2) + x3) How do I add a categorical variable

Piecewise regression with a quadratic polynomial and a straight line joining smoothly at a break point

南楼画角 提交于 2019-12-18 16:59:38
问题 I want to fit a piecewise linear regression with one break point xt , such that for x < xt we have a quadratic polynomial and for x >= xt we have a straight line. Two pieces should join smoothly, with continuity up to 1st derivative at xt . Here's picture of what it may look like: I have parametrize my piecewise regression function as: where a , b , c and xt are parameters to be estimated. I want to compare this model with a quadratic polynomial regression over the whole range in terms of

R smooth.spline(): smoothing spline is not smooth but overfitting my data

丶灬走出姿态 提交于 2019-12-18 16:54:29
问题 I have several data points which seem suitable for fitting a spline through them. When I do this, I get a rather bumpy fit, like overfitting, which is not what I understand as smoothing. Is there a special option / parameter for getting back the function of a really smooth spline like here. The usage of the penalty parameter for smooth.spline didn't have any visible effect. Maybe I did it wrong? Here are data and code: results <- structure( list( beta = c( 0.983790622281964, 0.645152464354322

机器学习算法(三)——Ridge算法和Lasso算法

情到浓时终转凉″ 提交于 2019-12-18 16:33:03
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 一、算法简介 1-1、岭回归(Ridge Regression) 上节我们讲到了线性回归,那么思考这么一个问题: 如果数据特征比样本点还多怎么办? 是否还可以使用线性回归和之前的方法来做预测? 答案是:否定的。因为此时输入数据的矩阵不是满秩矩阵,非满秩矩阵在求逆时会出现问题。 为了解决这个问题,引入了岭回归(Ridge Regression)的概念。 缩减方法可以去掉不重要的参数,因此能更好地理解数据。此外,与简单的线性回归相比,缩减法能取得更好的预测效果。 1-2、套索回归(Lasso Regression) 除了Ridge,还有一种正则化的线性回归是Lasso。与岭回归相同,使用Lasso也是约束系数使其接近于0。 二、算法原理 2-1、岭回归原理 Ridge回归通过对系数的大小进行惩罚来解决普通最小二乘的一些问题 。公式如下: 岭回归是加了二阶正则项的最小二乘,主要 适用于过拟合严重或各变量之间存在多重共线性 的时候,岭回归是有bias的,这里的bias是为了让variance更小。 所以岭回归的关键是找到一个合理的α值来平衡模型的方差和偏差。 α的选择: 模型的方差:回归系数的方差 模型的偏差:预测值和真实值的差异 2-2、套索回归原理 岭回归无法剔除变量,而LASSO回归模型

sklearn LogisticRegression and changing the default threshold for classification

徘徊边缘 提交于 2019-12-18 14:52:36
问题 I am using LogisticRegression from the sklearn package, and have a quick question about classification. I built a ROC curve for my classifier, and it turns out that the optimal threshold for my training data is around 0.25. I'm assuming that the default threshold when creating predictions is 0.5. How can I change this default setting to find out what the accuracy is in my model when doing a 10-fold cross-validation? Basically, I want my model to predict a '1' for anyone greater than 0.25, not

Multi-output regression

∥☆過路亽.° 提交于 2019-12-18 13:34:53
问题 I have been looking in to Multi-output regression the last view weeks. I am working with the scikit learn package. My machine learning problem has an a input of 3 features an needs to predict two output variables. Some ML models in the sklearn package support multioutput regression nativly. If the models do not support this, the sklearn multioutput regression algorithm can be used to convert it. The multioutput class fits one regressor per target. Does the mulioutput regressor class or

Fit a no-intercept model in caret

这一生的挚爱 提交于 2019-12-18 13:05:49
问题 In R, I specify a model with no intercept as follows: data(iris) lmFit <- lm(Sepal.Length ~ 0 + Petal.Length + Petal.Width, data=iris) > round(coef(lmFit),2) Petal.Length Petal.Width 2.86 -4.48 However, if I fit the same model with caret, the resulting model includes an intercept: library(caret) caret_lmFit <- train(Sepal.Length~0+Petal.Length+Petal.Width, data=iris, "lm") > round(coef(caret_lmFit$finalModel),2) (Intercept) Petal.Length Petal.Width 4.19 0.54 -0.32 How do I tell caret::train