Extracting unique numbers from string in R

谁说胖子不能爱 提交于 2019-11-26 06:38:01

问题


I have a list of strings which contain random characters such as:

list=list()
list[1] = \"djud7+dg[a]hs667\"
list[2] = \"7fd*hac11(5)\"
list[3] = \"2tu,g7gka5\"

I\'d like to know which numbers are present at least once (unique()) in this list. The solution of my example is:

solution: c(7,667,11,5,2)

If someone has a method that does not consider 11 as \"eleven\" but as \"one and one\", it would also be useful. The solution in this condition would be:

solution: c(7,6,1,5,2)

(I found this post on a related subject: Extracting numbers from vectors of strings)


回答1:


For the second answer, you can use gsub to remove everything from the string that's not a number, then split the string as follows:

unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), ""))))
# [1] 7 6 1 5 2

For the first answer, similarly using strsplit,

unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+")))))
# [1]   7 667  11   5   2

PS: don't name your variable list (as there's an inbuilt function list). I've named your data as ll.




回答2:


Here is yet another answer, this one using gregexpr to find the numbers, and regmatches to extract them:

l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")

temp1 <- gregexpr("[0-9]", l)   # Individual digits
temp2 <- gregexpr("[0-9]+", l)  # Numbers with any number of digits

as.numeric(unique(unlist(regmatches(l, temp1))))
# [1] 7 6 1 5 2
as.numeric(unique(unlist(regmatches(l, temp2))))
# [1]   7 667  11   5   2



回答3:


A solution using stringi

 # extract the numbers:

 nums <- stri_extract_all_regex(list, "[0-9]+")

 # Make vector and get unique numbers:

 nums <- unlist(nums)
 nums <- unique(nums)

And that's your first solution

For the second solution I would use substr:

nums_first <- sapply(nums, function(x) unique(substr(x,1,1)))



回答4:


You could use ?strsplit (like suggested in @Arun's answer in Extracting numbers from vectors (of strings)):

l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")

## split string at non-digits
s <- strsplit(l, "[^[:digit:]]")

## convert strings to numeric ("" become NA)
solution <- as.numeric(unlist(s))

## remove NA and duplicates
solution <- unique(solution[!is.na(solution)])
# [1]   7 667  11   5   2



回答5:


A stringr solution with str_match_all and piped operators. For the first solution:

library(stringr)
str_match_all(ll, "[0-9]+") %>% unlist %>% unique %>% as.numeric

Second solution:

str_match_all(ll, "[0-9]") %>% unlist %>% unique %>% as.numeric

(Note: I've also called the list ll)




回答6:


Use strsplit using pattern as the inverse of numeric digits: 0-9

For the example you have provided, do this:

tmp <- sapply(list, function (k) strsplit(k, "[^0-9]"))

Then simply take a union of all `sets' in the list, like so:

tmp <- Reduce(union, tmp)

Then you only have to remove the empty string.




回答7:


Check out the str_extract_numbers() function from the strex package.

pacman::p_load(strex)
list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"
charvec <- unlist(list)
print(charvec)
#> [1] "djud7+dg[a]hs667" "7fd*hac11(5)"     "2tu,g7gka5"
str_extract_numbers(charvec)
#> [[1]]
#> [1]   7 667
#> 
#> [[2]]
#> [1]  7 11  5
#> 
#> [[3]]
#> [1] 2 7 5
unique(unlist(str_extract_numbers(charvec)))
#> [1]   7 667  11   5   2

Created on 2018-09-03 by the reprex package (v0.2.0).



来源:https://stackoverflow.com/questions/17009628/extracting-unique-numbers-from-string-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!