na

replace values with NA across multiple columns if a condition is met in R

a 夏天 提交于 2021-01-28 18:53:23
问题 I'm trying to replace values across values with NA across multiple columns if a condition is met. Here's a sample dataset: library(tidyverse) sample <- tibble(id = 1:6, team_score = 5:10, cent_dept_test_agg = c(1, 2, 3, 4, 5, 6), cent_dept_blue_agg = c(15:20), num_in_dept = c(1, 1, 2, 5, 100, 6)) I want the columns that contain cent_dept_.*_agg to be NA when num_in_dept is 1, so it looks like this: library(tidyverse) solution <- tibble(id = 1:6, team_score = 5:10, cent_dept_test_agg = c(NA,

R: variable exclusion from formula not working in presence of missing data

狂风中的少年 提交于 2021-01-28 14:12:49
问题 I'm building a model in R, while excluding 'office' column in the formula (it sometimes contains hints of the class I predict ). I'm learning on 'train' and predicting on 'test': > model <- randomForest::randomForest(tc ~ . - office, data=train, importance=TRUE,proximity=TRUE ) > prediction <- predict(model, test, type = "class") the prediction resulted with all NAs: > head(prediction) [1] <NA> <NA> <NA> <NA> <NA> <NA> Levels: 2668 2752 2921 3005 the reason is that test$office contains NAs: >

Apply a function to multiple dataframes

狂风中的少年 提交于 2021-01-28 12:32:08
问题 I have many dataframes where missing values are denoted by the character string 'NA' which are not understood as missing by R. The lengthy solution would be to apply the following function to each dataframe: mydf[mydf == 'NA'] <- NA I want to apply the above function to many dataframes. Consider the following example: set.seed(123) A=as.data.frame(matrix(sample(c('NA',1:10),10*10,T),10))) B=as.data.frame(matrix(sample(c('NA',LETTERS[1:10]),10*10,T),10)) C=as.data.frame(matrix(sample(c('NA'

How to Apply functions to specific set of columns in data frame in R to replace NAs

我与影子孤独终老i 提交于 2021-01-27 21:50:55
问题 I have a data set in which I want to replace NAs in different columns differently. Following is the dummy data set and code to replicate it . test <- data.frame(ID = c(1:5), FirstName = c(NA,"Sid",NA,"Harsh","CJ"), LastName = c("Snow",NA,"Lapata","Khan",NA), BillNum = c(6:10), Phone = c(1213,3123,3123,NA,NA), Married = c("Yes","Yes",NA,"NO","Yes"), ZIP = c(1111,2222,333,444,555), Gender = c("M",NA,"F",NA,"M"), Address = c("A","B",NA,"C","D")) > test ID FirstName LastName BillNum Phone Married

How to create new column with all non-NA values from multiple other columns?

喜夏-厌秋 提交于 2021-01-27 19:37:41
问题 I would like to create a column d, which includes all the non-NA values from the other columns. I tried ifelse, but cannot figure out how to make it nested in the proper manner, so that the value in column c is included as well.. Perhaps something else than ifelse should be used? Here is a "dummy" dataframe: a <- c(NA, NA, NA, "A", "B", "A", NA, NA) b <- c("D", "A", "C", NA, NA, NA, NA, NA) c <- c(NA, NA, NA, NA, NA, NA, "C", NA) data <- data.frame(a, b, c) I would like the d column to look

Removing Columns Named “NA”

核能气质少年 提交于 2021-01-27 19:36:08
问题 I'm dealing with some RNA-seq count data for which I have ~60,000 columns containing gene names and 24 rows containing sample names. When I did some gene name conversions I was left with a bunch of columns that are named NA . I know that R handles NA differently than a typical column name and my question is how do I remove these columns. Here is an example of my data. "Gene1" "Gene2" "Gene3" NA "Gene4" 1 10 11 12 10 15 2 13 12 50 40 30 3 34 23 23 21 22 I would like it to end up like "Gene1"

Properties of pmatch function

大城市里の小女人 提交于 2021-01-27 14:59:15
问题 I don't understand the behavior of the built-in function pmatch (partial string matching). The description provides the following example: pmatch("m", c("mean", "median", "mode")) # returns NA instead of 1,2,3 but using: pmatch("m", "mean") # returns 1, as I would have expected. Could anybody explain to me this behavior? 回答1: As per the documentation: nomatch : the value to be returned at non-matching or multiply partially matching positions. Note that it is coerced to integer. The nomatch

Properties of pmatch function

百般思念 提交于 2021-01-27 14:34:00
问题 I don't understand the behavior of the built-in function pmatch (partial string matching). The description provides the following example: pmatch("m", c("mean", "median", "mode")) # returns NA instead of 1,2,3 but using: pmatch("m", "mean") # returns 1, as I would have expected. Could anybody explain to me this behavior? 回答1: As per the documentation: nomatch : the value to be returned at non-matching or multiply partially matching positions. Note that it is coerced to integer. The nomatch

Conditionally selecting columns in dplyr where certain proportion of values is NA

假装没事ソ 提交于 2021-01-18 05:21:05
问题 Data I'm working with a data set resembling the data.frame generated below: set.seed(1) dta <- data.frame(observation = 1:20, valueA = runif(n = 20), valueB = runif(n = 20), valueC = runif(n = 20), valueD = runif(n = 20)) dta[2:5,3] <- NA dta[2:10,4] <- NA dta[7:20,5] <- NA The columns have NA values with the last column having more than 60% of observations NAs . > sapply(dta, function(x) {table(is.na(x))}) $observation FALSE 20 $valueA FALSE 20 $valueB FALSE TRUE 16 4 $valueC FALSE TRUE 11 9

Conditionally selecting columns in dplyr where certain proportion of values is NA

江枫思渺然 提交于 2021-01-18 05:17:18
问题 Data I'm working with a data set resembling the data.frame generated below: set.seed(1) dta <- data.frame(observation = 1:20, valueA = runif(n = 20), valueB = runif(n = 20), valueC = runif(n = 20), valueD = runif(n = 20)) dta[2:5,3] <- NA dta[2:10,4] <- NA dta[7:20,5] <- NA The columns have NA values with the last column having more than 60% of observations NAs . > sapply(dta, function(x) {table(is.na(x))}) $observation FALSE 20 $valueA FALSE 20 $valueB FALSE TRUE 16 4 $valueC FALSE TRUE 11 9