regression

How to calculate cross validation error for ridge regression model?

送分小仙女□ 提交于 2020-04-16 05:49:28
问题 I am trying to fit a ridge regression model on the white wine dataset. I want to use the entire dataset for training and use 10 fold CV for calculating the test error rate. Thats the main question - how to calculate CV test error for a ridge regressed logistic model. I calculated the best value of lambda (also using CV ), and now I want to find the CV test error rate. Currently, my code for calculating the said CV test error is - cost1 <- function(good, pi=0) mean(abs(good-pi) > 0.5) ridge

How are envfit results created?

血红的双手。 提交于 2020-04-16 04:01:49
问题 I have a question regarding how to recreate the results from the envfit() function in the vegan package. Here is an example of envfit() being used with an ordination and an environmental vector. data(varespec) data(varechem) ord <- metaMDS(varespec) chem.envfit <- envfit(ord, varechem, choices = c(1,2), permutations = 999) chem.scores.envfit <- as.data.frame(scores(chem.envfit, display = "vectors")) chem.scores.envfit "The values that you see in the table are the standardised coefficients

How are envfit results created?

﹥>﹥吖頭↗ 提交于 2020-04-16 04:00:09
问题 I have a question regarding how to recreate the results from the envfit() function in the vegan package. Here is an example of envfit() being used with an ordination and an environmental vector. data(varespec) data(varechem) ord <- metaMDS(varespec) chem.envfit <- envfit(ord, varechem, choices = c(1,2), permutations = 999) chem.scores.envfit <- as.data.frame(scores(chem.envfit, display = "vectors")) chem.scores.envfit "The values that you see in the table are the standardised coefficients

Animate points and regression line along date with gganimate()

和自甴很熟 提交于 2020-04-16 03:20:12
问题 I'm trying to animate points and a loess regression line, so they appear/are revealed simultaneously along year, but I'm returned with an error, as described below with a reprex. This would be the ideal animation: https://user-images.githubusercontent.com/1775316/49728400-f0ba1b80-fc72-11e8-86c5-71ed84b247db.gif Unfortunately, the thread where I found this did not have the accompanying code. See my reprex problem here: #Animate points and regression loess line along with dates at the same

Animate points and regression line along date with gganimate()

谁都会走 提交于 2020-04-16 03:19:10
问题 I'm trying to animate points and a loess regression line, so they appear/are revealed simultaneously along year, but I'm returned with an error, as described below with a reprex. This would be the ideal animation: https://user-images.githubusercontent.com/1775316/49728400-f0ba1b80-fc72-11e8-86c5-71ed84b247db.gif Unfortunately, the thread where I found this did not have the accompanying code. See my reprex problem here: #Animate points and regression loess line along with dates at the same

Animate points and regression line along date with gganimate()

做~自己de王妃 提交于 2020-04-16 03:19:03
问题 I'm trying to animate points and a loess regression line, so they appear/are revealed simultaneously along year, but I'm returned with an error, as described below with a reprex. This would be the ideal animation: https://user-images.githubusercontent.com/1775316/49728400-f0ba1b80-fc72-11e8-86c5-71ed84b247db.gif Unfortunately, the thread where I found this did not have the accompanying code. See my reprex problem here: #Animate points and regression loess line along with dates at the same

How to compare feature selection regression-based algorithm with tree-based algorithms?

十年热恋 提交于 2020-04-16 02:47:07
问题 I'm trying to compare which feature selection model is more eficiente for a specific domain. Nowadays the state of the art in this domain (GWAS) is regression-based algorithms (LR, LMM, SAIGE, etc), but I want to give a try with tree-based algorithms (I'm using LightGBM LGBMClassifier with boosting_type='gbdt' as the cross-validation selected for me as most efficient one). I managed to get something like: Regression based alg --------------------- Features P-Values f1 2.49746e-21 f2 5.63324e

【人工智能】线性回归

て烟熏妆下的殇ゞ 提交于 2020-04-15 16:53:09
【推荐阅读】微服务还能火多久?>>> 目录 1.概念 2.理论 1.概念 线性回归(Linear Regression)是一种通过属性的线性组合来进行预测的线性模型,其目的是找到一条直线或者一个平面或者更高维的超平面,使得预测值与真实值之间的误差最小化。 通俗解释: 举个例子,银行现在有很多贷款客户,这些客户在贷款的时候,银行保存了他们的年龄、工资月收入、资产信息(是否有房、有车、理财产品等),那么我们称这这些信息为特征值。假如我现在去银行贷款,那么银行就可以通过已有客户的这些特征信息建立数据模型来预测我在该银行到底能贷多少钱。这里的数据模型指的就是线性回归。 假设X1表示工资,X2表示年龄,那么Y就表示银行可以给我贷款的额度。这样我们就可以拟合出一条曲线,如下图 2.理论 假设$\theta_1$是年龄,$\theta_2$是工资,那么拟合的平面为: $h_\theta(x)=\theta_0 + \theta_1x_1 + \theta_2x_2 (\theta_0是偏置项)$ 来源: oschina 链接: https://my.oschina.net/u/4312696/blog/3235756

[斯坦福大学2014机器学习教程笔记]第六章-分类回归

ぃ、小莉子 提交于 2020-04-15 10:49:23
【推荐阅读】微服务还能火多久?>>> 在这节以及接下来几节中,我们要开始讨论分类问题。这节将告诉我们为什么对于分类问题来说,使用线性回归并不是一个好主意。 在分类问题中,你要预测的变量y是一个离散的值,我们将学习一种叫做 逻辑回归(Logistic Regression)的算法 ,这是当今最流行、使用最广泛的学习算法之一。 分类问题的例子有: 垃圾邮件分类 (判断一封电子邮件是否是垃圾邮件)、 分类网上交易 (判断某一个交易是否是欺诈,例如是否用盗取的信用卡等等)、 肿瘤分类 (判断一个肿瘤是恶性的还是良性的)。在这些问题中,我们尝试预测的变量y是可以有两个取值的变量(0或1)。我们用0表示的那一类还可以叫做 负类(Negative Class) ,用1表示的那一类可以叫做 正类(Positive Class) 。一般来说,负类表示没有某样东西,比如说:没有恶性肿瘤。正类表示具有我们要寻找的东西。但是,什么是正类什么是负类是没有明确规定的。 现在我们要开始讨论只包含0和1两类的分类问题(即二元的分类问题)。那么,我们要如何开发一个分类算法呢? 这个例子的训练集是对肿瘤进行恶性或良性分类。注意到恶性与否只有两个值,0或者1。所以,我们可以做的就是对于这个给定的训练集,把我们学过的线性回归算法应用到这个数据集,用直线对数据进行拟合。如果你用直线去拟合这个训练集

Splitting data and running linear regression loop

旧巷老猫 提交于 2020-04-14 07:34:27
问题 I have seen a lot of similar questions, but there is one key to the loop that I am trying to write that I am missing. I have a a set of dataset with ~4,000 different keys, and for each key, there are ~1,000 observations. I have filtered out a key to isolate the observations for that key, run linear regression, checked model assumptions and all looks good. However, I want to loop over this dataset and run that linear regression for each of the keys. Then I will want to store the coefficients,