In genomics research, you often have many strings with duplicate gene names. I would like to find an efficient way to only keep the unique gene names in a string. This is an exa
Based on the example showed, perhaps
gsub("(\\w+);\\1", "\\1", genes) #[1] "GSTP1;APC"