na | 易学教程

Dealing with missing values for correlations calculation

阅读更多关于 Dealing with missing values for correlations calculation

I have huge matrix with a lot of missing values. I want to get the correlation between variables. 1. Is the solution cor(na.omit(matrix)) better than below? cor(matrix, use = "pairwise.complete.obs") I already have selected only variables having more than 20% of missing values. 2. Which is the best method to make sense ? I would vote for the second option. Sounds like you have a fair amount of missing data and so you would be looking for a sensible multiple imputation strategy to fill in the spaces. See Harrell's text "Regression Modeling Strategies" for a wealth of guidance on 'how's to do

STL decomposition of time series with missing values for anomaly detection

阅读更多关于 STL decomposition of time series with missing values for anomaly detection

I am trying to detect anomalous values in a time series of climatic data with some missing observations. Searching the web I found many available approaches. Of those, stl decomposition seems appealing, in the sense of removing trend and seasonal components and studying the remainder. Reading STL: A Seasonal-Trend Decomposition Procedure Based on Loess , stl appears to be flexible in determining the settings for assigning variability, unaffected by outliers and possible to apply despite missing values. However, trying to apply it in R, with four years of observations and defining all the

model.matrix() with na.action=NULL?

阅读更多关于 model.matrix() with na.action=NULL?

I have a formula and a data frame, and I want to extract the model.matrix() . However, I need the resulting matrix to include the NAs that were found in the original dataset. If I were to use model.frame() to do this, I would simply pass it na.action=NULL . However, the output I need is of the model.matrix() format. Specifically, I need only the right-hand side variables, I need the output to be a matrix (not a data frame), and I need factors to be converted to a series of dummy variables. I'm sure I could hack something together using loops or something, but I was wondering if anyone could

Replace NA values from a column with 0 in data frame R [duplicate]

阅读更多关于 Replace NA values from a column with 0 in data frame R [duplicate]

Possible Duplicate: Set NA to 0 in R I have a data.frame with a column having NA values. I want to replace NA with 0 or any other value. I have tried a lot of threads and methods but it did not give me the result. I have tried the below methods. a$x[a$x==NA]<-0; a[,c("x")]<-apply(a[,c("x")],1,function(z){replace(z, is.na(z), 0)}); a$x[is.na(a$x),]<-0; None of the above methods replaced NA with 0 in column x for data.frame a . Why? Since nobody so far felt fit to point out why what you're trying doesn't work: NA == NA doesn't return TRUE , it returns NA (since comparing to undefined values

Correct syntax for mutate_if

阅读更多关于 Correct syntax for mutate_if

I would like to replace NA values with zeros via mutate_if in dplyr . The syntax below: set.seed(1) mtcars[sample(1:dim(mtcars)[1], 5), sample(1:dim(mtcars)[2], 5)] <- NA require(dplyr) mtcars %>% mutate_if(is.na,0) mtcars %>% mutate_if(is.na, funs(. = 0)) returns error: Error in vapply(tbl, p, logical(1), ...) : values must be length 1, but FUN(X[[1]]) result is length 32 What's the correct syntax for this operation? I learned this trick from the purrr tutorial , and it also works in dplyr. There are two ways to solve this problem: First, define custom functions outside the pipe, and use it

R is there a way to find Inf/-Inf values?

阅读更多关于 R is there a way to find Inf/-Inf values?

问题 I'm trying to run a randomForest on a large-ish data set (5000x300). Unfortunately I'm getting an error message as follows: > RF <- randomForest(prePrior1, postPrior1[,6] + ,,do.trace=TRUE,importance=TRUE,ntree=100,,forest=TRUE) Error in randomForest.default(prePrior1, postPrior1[, 6], , do.trace = TRUE, : NA/NaN/Inf in foreign function call (arg 1) So I try to find any NA's using : > df2 <- prePrior1[is.na(prePrior1)] > df2 character(0) > df2 <- postPrior1[is.na(postPrior1[,6])] > df2

Replace all NA with FALSE in selected columns in R

阅读更多关于 Replace all NA with FALSE in selected columns in R

问题 I have a question similar to this one, but my dataset is a bit bigger: 50 columns with 1 column as UID and other columns carrying either TRUE or NA , I want to change all the NA to FALSE , but I don't want to use explicit loop. Can plyr do the trick? Thanks. UPDATE #1 Thanks for quick reply, but what if my dataset is like below: df <- data.frame( id = c(rep(1:19),NA), x1 = sample(c(NA,TRUE), 20, replace = TRUE), x2 = sample(c(NA,TRUE), 20, replace = TRUE) ) I only want X1 and X2 to be

R: need finite 'ylim' values in function

阅读更多关于 R: need finite 'ylim' values in function

I'd like to plot the data in data.frame xy for each group (defined by ID ). When a year before 1946 is in a group, plot 2 should be executed. When the years are between 1946 and 2014, plot1 should be executed. My problem: This works fine without NA values, but as I have data gaps I rely on NAs to define these data gaps. This is why I get an error: error in plot.window(need finite 'ylim' values) . I tried to put finite=T in plot1 at the y-axis but this gives a subscript out of bounds error. Is there a way I could solve this and that the graphics are correctly plotted? In the following is my

Treat NA as zero only when adding a number

阅读更多关于 Treat NA as zero only when adding a number

问题 When calculating the sum of two data tables, NA+n=NA . > dt1 <- data.table(Name=c("Joe","Ann"), "1"=c(0,NA), "2"=c(3,NA)) > dt1 Name 1 2 1: Joe 0 3 2: Ann NA NA > dt2 <- data.table(Name=c("Joe","Ann"), "1"=c(0,NA), "2"=c(2,3)) > dt2 Name 1 2 1: Joe 0 2 2: Ann NA 3 > dtsum <- rbind(dt1, dt2)[, lapply(.SD, sum), by=Name] > dtsum Name 1 2 1: Joe 0 5 2: Ann NA NA I don't want to substitute all NA's with 0. What I want is NA+NA=NA and NA+n=n to get the following result: Name 1 2 1: Joe 0 5 2: Ann

Fill missing values in the data.frame with the data from the same data frame

阅读更多关于 Fill missing values in the data.frame with the data from the same data frame

I'm trying to backfill a fully outerjoined table with nearest preceding column data. The data frame I have looks like.. (No rows have both sides as NA and the table is sorted by date). date X Y 2012-07-05 00:01:19 0.0122 NA 2012-07-05 03:19:34 0.0121 NA 2012-07-05 03:19:56 0.0121 0.027 2012-07-05 03:20:31 0.0121 NA 2012-07-05 04:19:56 0.0121 0.028 2012-07-05 04:20:31 0.0121 NA 2012-07-05 04:20:50 0.0121 NA 2012-07-05 04:22:29 0.0121 0.027 2012-07-05 04:24:37 0.0121 NA 2012-07-05 20:48:45 0.0121 NA 2012-07-05 23:02:34 NA 0.029 2012-07-05 23:30:45 NA 0.029 with this, I'm looking to.. leave the