na

Replacing character values with NA in a data frame

旧街凉风 提交于 2019-11-26 15:21:27
I have a data frame containing (in random places) a character value (say "foo" ) that I want to replace with a NA . What's the best way to do so across the whole data frame? c-urchin This: df[ df == "foo" ] <- NA One way to nip this in the bud is to convert that character to NA when you read the data in in the first place. df <- read.csv("file.csv", na.strings = c("foo", "bar")) Another option is is.na<- : is.na(df) <- df == "foo" Note that its use may seem a bit counter-intuitive, but it actually assigns NA values to df at the index on the right hand side. This could be done with dplyr:

suppress NAs in paste()

删除回忆录丶 提交于 2019-11-26 14:37:26
Regarding the bounty Ben Bolker 's paste2 -solution produces a "" when the strings that are pasted contains NA 's in the same position. Like this, > paste2(c("a","b", "c", NA), c("A","B", NA, NA)) [1] "a, A" "b, B" "c" "" The fourth element is an "" instead of an NA Like this, [1] "a, A" "b, B" "c" NA I'm offering up this small bounty for anyone who can fix this. Original question I've read the help page ?paste , but I don't understand how to have R ignore NA s. I do the following, foo <- LETTERS[1:4] foo[4] <- NA foo [1] "A" "B" "C" NA paste(1:4, foo, sep = ", ") and get [1] "1, A" "2, B" "3,

How to replace NA values in a table for selected columns

一笑奈何 提交于 2019-11-26 14:19:43
There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following: x[is.na(x)]<-0 But, what if I want to restrict it to only certain columns? Let's me show you an example. First, let's start with a dataset. set.seed(1234) x <- data.frame(a=sample(c(1,2,NA), 10, replace=T), b=sample(c(1,2,NA), 10, replace=T), c=sample(c(1:5,NA), 10, replace=T)) Which gives: a b c 1 1 NA 2 2 2 2 2 3 2 1 1 4 2 NA 1 5 NA 1 2 6 2 NA 5 7 1 1 4 8 1 1 NA 9 2 1 5 10 2 1 1 Ok, so I only want to restrict the replacement to columns 'a' and 'b'. My

Select NA in a data.table in R

蓝咒 提交于 2019-11-26 13:57:31
问题 How do I select all the rows that have a missing value in the primary key in a data table. DT = data.table(x=rep(c("a","b",NA),each=3), y=c(1,3,6), v=1:9) setkey(DT,x) Selecting for a particular value is easy DT["a",] Selecting for the missing values seems to require a vector search. One cannot use binary search. Am I correct? DT[NA,]# does not work DT[is.na(x),] #does work 回答1: Fortunately, DT[is.na(x),] is nearly as fast as (e.g.) DT["a",] , so in practice, this may not really matter much:

How to delete rows from a dataframe that contain n*NA

旧巷老猫 提交于 2019-11-26 13:49:16
I have a number of large datasets with ~10 columns, and ~200000 rows. Not all columns contain values for each row, although at least one column must contain a value for the row to be present, I would like to set a threshold for how many NA s are allowed in a row. My Dataframe looks something like this: ID q r s t u v w x y z A 1 5 NA 3 8 9 NA 8 6 4 B 5 NA 4 6 1 9 7 4 9 3 C NA 9 4 NA 4 8 4 NA 5 NA D 2 2 6 8 4 NA 3 7 1 32 And I would like to be able to delete the rows that contain more than 2 cells containing NA to get ID q r s t u v w x y z A 1 5 NA 3 8 9 NA 8 6 4 B 5 NA 4 6 1 9 7 4 9 3 D 2 2 6

Creating (and Accessing) a Sparse Matrix with NA default entries

别等时光非礼了梦想. 提交于 2019-11-26 13:46:08
问题 After learning about the options for working with sparse matrices in R, I want to use the Matrix package to create a sparse matrix from the following data frame and have all other elements be NA . s r d 1 1089 3772 1 2 1109 190 1 3 1109 2460 1 4 1109 3071 2 5 1109 3618 1 6 1109 38 7 I know I can create a sparse matrix with the following, accessing elements as usual: > library(Matrix) > Y <- sparseMatrix(s,r,x=d) > Y[1089,3772] [1] 1 > Y[1,1] [1] 0 but if I want to have the default value to be

Remove NA values from a vector

拥有回忆 提交于 2019-11-26 12:45:59
I have a huge vector which has a couple of NA values, and I'm trying to find the max value in that vector (the vector is all numbers), but I can't do this because of the NA values. How can I remove the NA values so that I can compute the max? Trying ?max , you'll see that it actually has a na.rm = argument, set by default to FALSE . (That's the common default for many other R functions, including sum() , mean() , etc.) Setting na.rm=TRUE does just what you're asking for: d <- c(1, 100, NA, 10) max(d, na.rm=TRUE) If you do want to remove all of the NA s, use this idiom instead: d <- d[!is.na(d)

Subset of rows containing NA (missing) values in a chosen column of a data frame

99封情书 提交于 2019-11-26 12:35:47
问题 We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column ( VaR2 ) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA , for missing data. Var1 Var2 10 2010/01/01 20 NA 30 2010/03/01 We would like to use the subset command to define a new data frame new_DF such that it only contains rows that have an NA\' value from the column ( VaR2 ). In the example given, only Row

How to delete columns that contain ONLY NAs?

ぃ、小莉子 提交于 2019-11-26 12:04:46
问题 I have a data.frame containing some columns with all NA values, how can I delete them from the data.frame. Can I use the function na.omit(...) specifying some additional arguments? 回答1: One way of doing it: df[, colSums(is.na(df)) != nrow(df)] If the count of NAs in a column is equal to the number of rows, it must be entirely NA. Or similarly df[colSums(!is.na(df)) > 0] 回答2: Here is a dplyr solution: df %>% select_if(~sum(!is.na(.)) > 0) 回答3: It seeems like you want to remove ONLY columns

Replace NA in column with value in adjacent column

不问归期 提交于 2019-11-26 11:50:48
This question is related to a post with a similar title ( replace NA in an R vector with adjacent values ). I would like to scan a column in a data frame and replace NA's with the value in the adjacent cell. In the aforementioned post, the solution was to replace the NA not with the value from the adjacent vector (e.g. the adjacent element in the data matrix) but was a conditional replace for a fixed value. Below is a reproducible example of my problem: UNIT <- c(NA,NA, 200, 200, 200, 200, 200, 300, 300, 300,300) STATUS <-c('ACTIVE','INACTIVE','ACTIVE','ACTIVE','INACTIVE','ACTIVE','INACTIVE',