We can also use a vectorized approach with regex
. After paste
ing the elements of each row of the dataset (do.call(paste0, ...
), match a pattern of any character, capture as a group ((.)
), using the positive lookahead, match characters only if it appears again later in the string (\\1
- backreference for the captured group and replace it with blank (""
). So, in effect only those characters remain that will be unique. Then, with nchar
we count the number of characters in the string.
example$count <- nchar(gsub("(.)(?=.*?\\1)", "", do.call(paste0, example), perl = TRUE))
example$count
#[1] 2 1 3 3 2 1