I have a large data.frame of character data that I want to convert based on what is commonly called a dictionary in other languages.
Currently I am going about it li
We can also use dplyr::case_when
library(dplyr)
foo %>%
mutate_all(~case_when(. == "AA" ~ "0101",
. == "AC" ~ "0102",
. == "AG" ~ "0103",
TRUE ~ .))
# snp1 snp2 snp3
#1 0101 0101
#2 0103 AT GG
#3 0101 0103 GG
#4 0101 0101 GC
It checks the condition and replaces with the corresponding value if the condition is TRUE
. We can add more conditions if needed and with TRUE ~ .
we keep the values as it is if none of the condition is matched. If we want to change them to NA
instead we can remove the last line.
foo %>%
mutate_all(~case_when(. == "AA" ~ "0101",
. == "AC" ~ "0102",
. == "AG" ~ "0103"))
# snp1 snp2 snp3
#1 0101 0101
#2 0103
#3 0101 0103
#4 0101 0101
This will change the values to NA
if none of the above condition is satisfied.
Another option using only base R is to create a lookup
dataframe with old and new values, unlist
the dataframe, match
them with old values, get the corresponding new values and replace.
lookup <- data.frame(old_val = c("AA", "AC", "AG"),
new_val = c("0101", "0102", "0103"))
foo[] <- lookup$new_val[match(unlist(foo), lookup$old_val)]