Deleting reversed duplicates with R

前端 未结 3 1253
野的像风
野的像风 2020-11-27 22:32

I have a data frame in R that contains the gene ids of paralogous genes in Arabidopsis, looking something like this:

gene_x    gene_y
AT1            


        
3条回答
  •  囚心锁ツ
    2020-11-27 23:12

    Another tidyverse-centric approach but using purrr:

    library(tidyverse)
    
    c_sort_collapse <- function(...){
      c(...) %>% 
        sort() %>% 
        str_c(collapse = ".")
    }
    
    mydf %>% 
      mutate(x_y = map2_chr(gene_x, gene_y, c_sort_collapse)) %>% 
      distinct(x_y, .keep_all = TRUE) %>% 
      select(-x_y)
    #>   gene_x gene_y
    #> 1    AT1    AT2
    #> 2    AT3    AT4
    #> 3    AT1    AT3
    

提交回复
热议问题