I have a data frame in R that contains the gene ids of paralogous genes in Arabidopsis, looking something like this:
gene_x gene_y
AT1
Another tidyverse-centric approach but using purrr:
library(tidyverse)
c_sort_collapse <- function(...){
c(...) %>%
sort() %>%
str_c(collapse = ".")
}
mydf %>%
mutate(x_y = map2_chr(gene_x, gene_y, c_sort_collapse)) %>%
distinct(x_y, .keep_all = TRUE) %>%
select(-x_y)
#> gene_x gene_y
#> 1 AT1 AT2
#> 2 AT3 AT4
#> 3 AT1 AT3