na

Dealing with missing values for correlations calculation

眉间皱痕 提交于 2019-11-30 01:29:27
I have huge matrix with a lot of missing values. I want to get the correlation between variables. 1. Is the solution cor(na.omit(matrix)) better than below? cor(matrix, use = "pairwise.complete.obs") I already have selected only variables having more than 20% of missing values. 2. Which is the best method to make sense ? I would vote for the second option. Sounds like you have a fair amount of missing data and so you would be looking for a sensible multiple imputation strategy to fill in the spaces. See Harrell's text "Regression Modeling Strategies" for a wealth of guidance on 'how's to do

STL decomposition of time series with missing values for anomaly detection

巧了我就是萌 提交于 2019-11-30 01:16:34
I am trying to detect anomalous values in a time series of climatic data with some missing observations. Searching the web I found many available approaches. Of those, stl decomposition seems appealing, in the sense of removing trend and seasonal components and studying the remainder. Reading STL: A Seasonal-Trend Decomposition Procedure Based on Loess , stl appears to be flexible in determining the settings for assigning variability, unaffected by outliers and possible to apply despite missing values. However, trying to apply it in R, with four years of observations and defining all the

model.matrix() with na.action=NULL?

心不动则不痛 提交于 2019-11-29 22:53:27
I have a formula and a data frame, and I want to extract the model.matrix() . However, I need the resulting matrix to include the NAs that were found in the original dataset. If I were to use model.frame() to do this, I would simply pass it na.action=NULL . However, the output I need is of the model.matrix() format. Specifically, I need only the right-hand side variables, I need the output to be a matrix (not a data frame), and I need factors to be converted to a series of dummy variables. I'm sure I could hack something together using loops or something, but I was wondering if anyone could

Replace NA values from a column with 0 in data frame R [duplicate]

亡梦爱人 提交于 2019-11-29 22:51:55
Possible Duplicate: Set NA to 0 in R I have a data.frame with a column having NA values. I want to replace NA with 0 or any other value. I have tried a lot of threads and methods but it did not give me the result. I have tried the below methods. a$x[a$x==NA]<-0; a[,c("x")]<-apply(a[,c("x")],1,function(z){replace(z, is.na(z), 0)}); a$x[is.na(a$x),]<-0; None of the above methods replaced NA with 0 in column x for data.frame a . Why? Since nobody so far felt fit to point out why what you're trying doesn't work: NA == NA doesn't return TRUE , it returns NA (since comparing to undefined values

Correct syntax for mutate_if

泄露秘密 提交于 2019-11-29 20:45:44
I would like to replace NA values with zeros via mutate_if in dplyr . The syntax below: set.seed(1) mtcars[sample(1:dim(mtcars)[1], 5), sample(1:dim(mtcars)[2], 5)] <- NA require(dplyr) mtcars %>% mutate_if(is.na,0) mtcars %>% mutate_if(is.na, funs(. = 0)) returns error: Error in vapply(tbl, p, logical(1), ...) : values must be length 1, but FUN(X[[1]]) result is length 32 What's the correct syntax for this operation? I learned this trick from the purrr tutorial , and it also works in dplyr. There are two ways to solve this problem: First, define custom functions outside the pipe, and use it

R is there a way to find Inf/-Inf values?

六眼飞鱼酱① 提交于 2019-11-29 17:38:15
问题 I'm trying to run a randomForest on a large-ish data set (5000x300). Unfortunately I'm getting an error message as follows: > RF <- randomForest(prePrior1, postPrior1[,6] + ,,do.trace=TRUE,importance=TRUE,ntree=100,,forest=TRUE) Error in randomForest.default(prePrior1, postPrior1[, 6], , do.trace = TRUE, : NA/NaN/Inf in foreign function call (arg 1) So I try to find any NA's using : > df2 <- prePrior1[is.na(prePrior1)] > df2 character(0) > df2 <- postPrior1[is.na(postPrior1[,6])] > df2

Replace all NA with FALSE in selected columns in R

佐手、 提交于 2019-11-29 11:19:44
问题 I have a question similar to this one, but my dataset is a bit bigger: 50 columns with 1 column as UID and other columns carrying either TRUE or NA , I want to change all the NA to FALSE , but I don't want to use explicit loop. Can plyr do the trick? Thanks. UPDATE #1 Thanks for quick reply, but what if my dataset is like below: df <- data.frame( id = c(rep(1:19),NA), x1 = sample(c(NA,TRUE), 20, replace = TRUE), x2 = sample(c(NA,TRUE), 20, replace = TRUE) ) I only want X1 and X2 to be

R: need finite 'ylim' values in function

十年热恋 提交于 2019-11-29 10:30:27
I'd like to plot the data in data.frame xy for each group (defined by ID ). When a year before 1946 is in a group, plot 2 should be executed. When the years are between 1946 and 2014, plot1 should be executed. My problem: This works fine without NA values, but as I have data gaps I rely on NAs to define these data gaps. This is why I get an error: error in plot.window(need finite 'ylim' values) . I tried to put finite=T in plot1 at the y-axis but this gives a subscript out of bounds error. Is there a way I could solve this and that the graphics are correctly plotted? In the following is my

Treat NA as zero only when adding a number

ε祈祈猫儿з 提交于 2019-11-29 09:26:29
问题 When calculating the sum of two data tables, NA+n=NA . > dt1 <- data.table(Name=c("Joe","Ann"), "1"=c(0,NA), "2"=c(3,NA)) > dt1 Name 1 2 1: Joe 0 3 2: Ann NA NA > dt2 <- data.table(Name=c("Joe","Ann"), "1"=c(0,NA), "2"=c(2,3)) > dt2 Name 1 2 1: Joe 0 2 2: Ann NA 3 > dtsum <- rbind(dt1, dt2)[, lapply(.SD, sum), by=Name] > dtsum Name 1 2 1: Joe 0 5 2: Ann NA NA I don't want to substitute all NA's with 0. What I want is NA+NA=NA and NA+n=n to get the following result: Name 1 2 1: Joe 0 5 2: Ann

Fill missing values in the data.frame with the data from the same data frame

孤者浪人 提交于 2019-11-29 08:41:45
I'm trying to backfill a fully outerjoined table with nearest preceding column data. The data frame I have looks like.. (No rows have both sides as NA and the table is sorted by date). date X Y 2012-07-05 00:01:19 0.0122 NA 2012-07-05 03:19:34 0.0121 NA 2012-07-05 03:19:56 0.0121 0.027 2012-07-05 03:20:31 0.0121 NA 2012-07-05 04:19:56 0.0121 0.028 2012-07-05 04:20:31 0.0121 NA 2012-07-05 04:20:50 0.0121 NA 2012-07-05 04:22:29 0.0121 0.027 2012-07-05 04:24:37 0.0121 NA 2012-07-05 20:48:45 0.0121 NA 2012-07-05 23:02:34 NA 0.029 2012-07-05 23:30:45 NA 0.029 with this, I'm looking to.. leave the