regression

正确选择ML算法

送分小仙女□ 提交于 2020-02-28 01:59:52
本文教你如何选择合适自己的机器学习算法。 分类 逻辑斯蒂回归(Logistic regression) 属于判别式模型,有很多正则化模型的方法(L0,L1,L2,etc),而且你不必像在用朴素贝叶斯那样担心你的特征是否相关。与决策树与SVM机相比,你还会得到一个不错的概率解释,你甚至可以轻松地利用新数据来更新模型(使用在线梯度下降算法,onlinegradientdescent)。如果你需要一个概率架构(比如,简单地调节分类阈值,指明不确定性,或者是要获得置信区间),或者你希望以后将更多的训练数据快速整合到模型中去,那么使用它吧。 优点 实现简单,广泛的应用于工业问题上; 分类时计算量非常小,速度很快,存储资源低; 便利的观测样本概率分数; 对逻辑回归而言,多重共线性并不是问题,它可以结合L2正则化来解决该问题; 表现最好:当特征没有相关性,最终分类结果是线性的,且特征维度远小于数据量的时候效果好。 缺点 当特征空间很大时,逻辑回归的性能不是很好; 容易欠拟合,一般准确度不太高 不能很好地处理大量多类特征或变量; 只能处理两分类问题(在此基础上衍生出来的softmax可以用于多分类),且必须线性可分; 对于非线性特征,需要进行转换; 表现最差:当特征相关性比较强时,表现会很差。 链接 机器学习之良/恶性乳腺癌肿瘤预测 机器学习算法集锦:从贝叶斯到深度学习及各自优缺点 朴素贝叶斯

why Keras 2D regression network has constant output

眉间皱痕 提交于 2020-02-25 07:12:21
问题 I am working on the some kind of the 2D Regression Deep network with keras, but the network has constant output for every datasets, even I test with handmade dataset in this code I feed the network with a constant 2d values and the output is linear valu of the X (2*X/100) but the out put is constant. import resource import glob import gc rsrc = resource.RLIMIT_DATA soft, hard = resource.getrlimit(rsrc) print ('Soft limit starts as :', soft) resource.setrlimit(rsrc, (4 * 1024 * 1024 * 1024,

why Keras 2D regression network has constant output

丶灬走出姿态 提交于 2020-02-25 07:12:05
问题 I am working on the some kind of the 2D Regression Deep network with keras, but the network has constant output for every datasets, even I test with handmade dataset in this code I feed the network with a constant 2d values and the output is linear valu of the X (2*X/100) but the out put is constant. import resource import glob import gc rsrc = resource.RLIMIT_DATA soft, hard = resource.getrlimit(rsrc) print ('Soft limit starts as :', soft) resource.setrlimit(rsrc, (4 * 1024 * 1024 * 1024,

How to do 2SLS IV regression using statsmodels python?

孤者浪人 提交于 2020-02-22 08:45:25
问题 I'm trying to do 2 stage least squares regression in python using the statsmodels library. from statsmodels.sandbox.regression.gmm import IV2SLS resultIV = IV2SLS(dietdummy['Log Income'], dietdummy.drop(['Log Income', 'Diabetes']), dietdummy.drop(['Log Income', 'Reads Nutri') Reads Nutri is my endogenous variable my instrument is Diabetes and my dependent variable is Log Income . Did I do this right? its much different than the way I would do it on stata. Also, when I do resultIV.summary() I

How to do 2SLS IV regression using statsmodels python?

橙三吉。 提交于 2020-02-22 08:44:08
问题 I'm trying to do 2 stage least squares regression in python using the statsmodels library. from statsmodels.sandbox.regression.gmm import IV2SLS resultIV = IV2SLS(dietdummy['Log Income'], dietdummy.drop(['Log Income', 'Diabetes']), dietdummy.drop(['Log Income', 'Reads Nutri') Reads Nutri is my endogenous variable my instrument is Diabetes and my dependent variable is Log Income . Did I do this right? its much different than the way I would do it on stata. Also, when I do resultIV.summary() I

How to do 2SLS IV regression using statsmodels python?

若如初见. 提交于 2020-02-22 08:44:05
问题 I'm trying to do 2 stage least squares regression in python using the statsmodels library. from statsmodels.sandbox.regression.gmm import IV2SLS resultIV = IV2SLS(dietdummy['Log Income'], dietdummy.drop(['Log Income', 'Diabetes']), dietdummy.drop(['Log Income', 'Reads Nutri') Reads Nutri is my endogenous variable my instrument is Diabetes and my dependent variable is Log Income . Did I do this right? its much different than the way I would do it on stata. Also, when I do resultIV.summary() I

Cluster standard errors for ordered logit R polr - values deleted in estimation

此生再无相见时 提交于 2020-02-05 05:01:05
问题 I am quite new to R and used to pretty basic application. Now I have encountered a problem I need help with: I am looking for a way to cluster standard errors for an ordered logistic regression (my estimation is similar to this example) I already tried robcov and vcovCL and they give me similar error messages: Error in meatCL(x, cluster = cluster, type = type, ...) : number of observations in 'cluster' and 'estfun()' do not match Error in u[, ii] <- ui : number of items to replace is not a

Iterating over multiple regression models and data subsets in R

北慕城南 提交于 2020-02-03 12:16:08
问题 I am trying to learn how to automate running 3 or more regression models over subsets of a dataset using the purrr and broom packages in R. I am doing this with the nest %>% mutate(map()) %>% unnest() flow in mind. I am able to replicate examples online when there is only one regression model that is applied to several data subsets. However, I am running into problems when I have more than one regression model in my function. What I tried to do library(tidyverse) library(broom) estimate_model

Iterating over multiple regression models and data subsets in R

允我心安 提交于 2020-02-03 12:15:27
问题 I am trying to learn how to automate running 3 or more regression models over subsets of a dataset using the purrr and broom packages in R. I am doing this with the nest %>% mutate(map()) %>% unnest() flow in mind. I am able to replicate examples online when there is only one regression model that is applied to several data subsets. However, I am running into problems when I have more than one regression model in my function. What I tried to do library(tidyverse) library(broom) estimate_model

Polynomial Regression nonsense Predictions

佐手、 提交于 2020-02-01 05:39:27
问题 Suppose I want to fit a linear regression model with degree two (orthogonal) polynomial and then predict the response. Here are the codes for the first model (m1) x=1:100 y=-2+3*x-5*x^2+rnorm(100) m1=lm(y~poly(x,2)) prd.1=predict(m1,newdata=data.frame(x=105:110)) Now let's try the same model but instead of using $poly(x,2)$, I will use its columns like: m2=lm(y~poly(x,2)[,1]+poly(x,2)[,2]) prd.2=predict(m2,newdata=data.frame(x=105:110)) Let's look at the summaries of m1 and m2. > summary(m1)