问题
I have this vector:
vector <- c("XXXX-X-X", "---X-X-X", "--X---XX", "--X-X--X", "-X---XX-", "-X--X--X", "X-----XX", "X----X-X", "X---XX--", "XX--X---", "---X-XXX", "--X-XX-X")
I want to detect the maximum of consecutive times that appears X. So, my expected vector would be:
4, 1, 2, 1,2, 1, 2, 1, 2, 2, 3, 2
回答1:
In base R, we can split each vector into separate characters and then using rle find the max consecutive length for "X".
sapply(strsplit(vector, ""), function(x) {
inds = rle(x)
max(inds$lengths[inds$values == "X"])
})
#[1] 4 1 2 1 2 1 2 1 2 2 3 2
回答2:
Here is a slightly different approach. We can split each term in the input vector on any number of dashes. Then, find the substring with the greatest length.
sapply(vector, function(x) {
max(nchar(unlist(strsplit(x, "-+"))))
})
XXXX-X-X ---X-X-X --X---XX --X-X--X -X---XX- -X--X--X X-----XX X----X-X
4 1 2 1 2 1 2 1
X---XX-- XX--X--- ---X-XXX --X-XX-X
2 2 3 2
I suspect that X really just represents any non dash character, so we don't need to explicitly check for it. If you do really only want to count X, then we can try removing all non X characters before we count:
sapply(vector, function(x) {
max(nchar(gsub("[^X]", "", unlist(strsplit(x, "-+")))))
})
回答3:
Use strapply in gsubfn to extract out the X... substrings applying nchar to each to count its number of character producing a list of vectors of lengths. sapply the max function each such vector.
library(gsubfn)
sapply(strapply(vector, "X+", nchar), max)
## [1] 4 1 2 1 2 1 2 1 2 2 3 2
回答4:
Here are a couple of tidyverse alternatives:
map_dbl(vector, ~sum(str_detect(., strrep("X", 1:8))))
# [1] 4 1 2 1 2 1 2 1 2 2 3 2
map_dbl(strsplit(vector,"-"), ~max(nchar(.)))
# [1] 4 1 2 1 2 1 2 1 2 2 3 2
来源:https://stackoverflow.com/questions/53521119/count-the-maximum-of-consecutive-letters-in-a-string