regression

zeroinfl “system is computationally singular” whereas no correlation in predictors

蹲街弑〆低调 提交于 2019-12-11 03:17:32
问题 I am trying to model count data on the number of absence days by worker in a year (dependant variable). I have a set of predictors, including information about workers, about their job, etc..., and most of them are categorical variables. Consequently, there is a large number of coefficient to estimate (83), but as I have more than 600 000 rows, I think it should not be problematic. In addition, I have no missing values in my dataset. My dependant variable contains lot of zero values, so I

Outliers with robust regression in R

巧了我就是萌 提交于 2019-12-11 02:48:11
问题 I am using the lmrob function in R using the robustbase library for robust regression. I would use it as, rob_reg<-lmrob(y~0+.,dat,method="MM",control=a1) . When i want to return the summary i use summary(rob_reg) and one thing robust regression do is identifying outliers in the data. A certain part of the summary output give me the following, 6508 observations c(49,55,58,77,104,105,106,107,128,134,147,153,...) are outliers with |weight| <= 1.4e-06 ( < 1.6e-06); which list all the outliers,

biglm predict unable to allocate a vector of size xx.x MB

断了今生、忘了曾经 提交于 2019-12-11 02:36:53
问题 I have this code: library(biglm) library(ff) myData <- read.csv.ffdf(file = "myFile.csv") testData <- read.csv(file = "test.csv") form <- dependent ~ . model <- biglm(form, data=myData) predictedData <- predict(model, newdata=testData) the model is created without problems, but when I make the prediction... it runs out of memory: unable to allocate a vector of size xx.x MB some hints? or how to use ff to reserve memory for predictedData variable? 回答1: I have not used biglm package before.

Statistical regression on multi-dimensional data [closed]

纵饮孤独 提交于 2019-12-11 02:29:46
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . I have a set of data in (x, y, z) format where z is the output of some formula involving x and y . I want to find out what the formula is, and my Internet research suggests that statistical regression is the way to do this. However, all of the examples I have found while researching only deal with two

GAM with “gp” smoother: how to retrieve the variogram parameters?

我们两清 提交于 2019-12-11 01:46:13
问题 I am using the following geoadditive model library(gamair) library(mgcv) data(mack) mack$log.net.area <- log(mack$net.area) gm2 <- gam(egg.count ~ s(lon,lat,bs="gp",k=100,m=c(2,10,1)) + s(I(b.depth^.5)) + s(c.dist) + s(temp.20m) + offset(log.net.area), data = mack, family = tw, method = "REML") Here I am using an exponential covariance function with range = 10 and power = 1 ( m=c(2,10,1) ). How can I retrieve from the results the variogram parameters (nugget, sill)? I couldn't find anything

How to set intercept_scaling in scikit-learn LogisticRegression

十年热恋 提交于 2019-12-11 01:19:33
问题 I am using scikit-learn's LogisticRegression object for regularized binary classification. I've read the documentation on intercept_scaling but I don't understand how to choose this value intelligently. The datasets look like this: 10-20 features, 300-500 replicates Highly non-Gaussian, in fact most observations are zeros The output classes are not necessarily equally likely. In some cases they are almost 50/50, in other cases they are more like 90/10. Typically C=0.001 gives good cross

Plotting estimates using ggplot2 & facet_wrap WITHOUT re-fitting models

拥有回忆 提交于 2019-12-11 00:33:45
问题 I am pretty new to ggplot2 and am looking to produce a figure with multiple scatter plots with their respective regression estimates. However I am using non-standard regression approaches (e.g quantile regression and total regression) that are not among the list of method arguments available in geom_smooth() . I have a list of fitted models and corresponding data. Below is a working example. require(data.table); require(ggplot2) N <- 1000 # Generate some data DT <- NULL models <- list() #

Stata drops variables that “predicts failure perfeclty” even though the correlation between the variables isn't 1 or -1?

让人想犯罪 __ 提交于 2019-12-10 21:26:31
问题 I am running a logit regression on some data. My dependent variable is binary as are all but one of my independent variables. When I run my regression, stata drops many of my independent variables and gives the error: "variable name" != 0 predicts failure perfectly "variable name" dropped and "a number" obs not used I know for a fact that some of the variables dropped don't predict failure perfectly. In other words, the dependent variables can take on the value 1 for either the value 1 or 0

Goodness of fit in CCA in R

99封情书 提交于 2019-12-10 21:07:20
问题 The following are the datasets mm <- read.csv("https://stats.idre.ucla.edu/stat/data/mmreg.csv") colnames(mm) <- c("Control", "Concept", "Motivation", "Read", "Write", "Math", "Science", "Sex") psych <- mm[, 1:3] # dataset A acad <- mm[, 4:8] # dataset B For these datasets psych and acad,I wanted to do the canonical correlation analysis and obtained the canonical correlation coefficients and canonical loadings as follows: require(CCA) cc1 <- cc(psych, acad) I would like to know if there is a

Predictions of ridge regression in R

笑着哭i 提交于 2019-12-10 18:45:04
问题 I've been really stuck on this, hope anyone can help me! I have a dataset with 54 columns and I want to make predictions on a test set with ridge regression. nn <-nrow(longley) index <- 1:nrow(longley) testindex <- sample(index, trunc(length(index)/3)) testset <- longley[testindex,] trainset <-longley[-testindex,] trainset1 <- trainset[,-7] # Fit the ridge regression model: mod <- lm.ridge(y ~., data = trainset, lambda = 0.661) # Predict and evaluate it by using MAE function: mae <- function