I have a list of strings which contain random characters such as:

list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"

I'd like to know which numbers are present at least once (unique()) in this list. The solution of my example is:

solution: c(7,667,11,5,2)

If someone has a method that does not consider 11 as "eleven" but as "one and one", it would also be useful. The solution in this condition would be:

solution: c(7,6,1,5,2)

(I found this post on a related subject: Extracting numbers from vectors of strings)

For the second answer, you can use gsub to remove everything from the string that's not a number, then split the string as follows:

unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), ""))))
# [1] 7 6 1 5 2

For the first answer, similarly using strsplit,

unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+")))))
# [1]   7 667  11   5   2

PS: don't name your variable list (as there's an inbuilt function list). I've named your data as ll.

Here is yet another answer, this one using gregexpr to find the numbers, and regmatches to extract them:

l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")

temp1 <- gregexpr("[0-9]", l)   # Individual digits
temp2 <- gregexpr("[0-9]+", l)  # Numbers with any number of digits

as.numeric(unique(unlist(regmatches(l, temp1))))
# [1] 7 6 1 5 2
as.numeric(unique(unlist(regmatches(l, temp2))))
# [1]   7 667  11   5   2

A solution using stringi

 # extract the numbers:

 nums <- stri_extract_all_regex(list, "[0-9]+")

 # Make vector and get unique numbers:

 nums <- unlist(nums)
 nums <- unique(nums)

And that's your first solution

For the second solution I would use substr:

nums_first <- sapply(nums, function(x) unique(substr(x,1,1)))

sgibb

You could use ?strsplit (like suggested in @Arun's answer in Extracting numbers from vectors (of strings)):

l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")

## split string at non-digits
s <- strsplit(l, "[^[:digit:]]")

## convert strings to numeric ("" become NA)
solution <- as.numeric(unlist(s))

## remove NA and duplicates
solution <- unique(solution[!is.na(solution)])
# [1]   7 667  11   5   2

A stringr solution with str_match_all and piped operators. For the first solution:

library(stringr)
str_match_all(ll, "[0-9]+") %>% unlist %>% unique %>% as.numeric

Second solution:

str_match_all(ll, "[0-9]") %>% unlist %>% unique %>% as.numeric

(Note: I've also called the list ll)

Use strsplit using pattern as the inverse of numeric digits: 0-9

For the example you have provided, do this:

tmp <- sapply(list, function (k) strsplit(k, "[^0-9]"))

Then simply take a union of all `sets' in the list, like so:

tmp <- Reduce(union, tmp)

Then you only have to remove the empty string.

Check out the str_extract_numbers() function from the strex package.

pacman::p_load(strex)
list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"
charvec <- unlist(list)
print(charvec)
#> [1] "djud7+dg[a]hs667" "7fd*hac11(5)"     "2tu,g7gka5"
str_extract_numbers(charvec)
#> [[1]]
#> [1]   7 667
#> 
#> [[2]]
#> [1]  7 11  5
#> 
#> [[3]]
#> [1] 2 7 5
unique(unlist(str_extract_numbers(charvec)))
#> [1]   7 667  11   5   2

Created on 2018-09-03 by the reprex package (v0.2.0).

来源：https://stackoverflow.com/questions/17009628/extracting-unique-numbers-from-string-in-r

标签

string

grep

character

gsub

Extracting unique numbers from string in R

A solution using stringi