问题
I have a character vector which looks like this
"9/14/2007,,,,88.22" "9/21/2007,,,,92.53" "9/28/2007,,,,92" "10/5/2007,,,,92.85"
Now i need to remove all the elements before the 4 commas. So at the end it should look like this
"88.22" "92.53" "92" "92.85"
I have tried the following code
gsub("[^0-9.]", "", x)
where x is my character vector but this keeps the integers before the commas (which are dates).
"914200788.22" "921200792.53" "928200792" "105200792.85"
Also the number of elements to remove isnt always the same but the last one to remove is always the last comma. Maybe this will help for the solution.
回答1:
Your regex just removes non-number characters. Try substituting everything before and including the four commas:
> vec = c("9/14/2007,,,,88.22", "9/21/2007,,,,92.53", "9/28/2007,,,,92", "10/5/2007,,,,92.85")
> sub(".*,,,,", "", vec)
[1] "88.22" "92.53" "92" "92.85"
回答2:
With stringr str_extract:
string = c("9/14/2007,,,,88.22", "9/21/2007,,,,92.53", "9/28/2007,,,,92", "10/5/2007,,,,92.85")
library(stringr)
str_extract(string, "\\d+[.]?\\d+$")
Or
str_extract(string, "(?<=,{4}).*")
Base R equivalent:
unlist(regmatches(string, gregexpr("\\d+[.]?\\d+$", string)))
unlist(regmatches(string, gregexpr("(?<=,{4}).*", string, perl = TRUE)))
sapply(str_split(string, ",,,,"), `[`, 2)
Notes:
$matches the end of string(?<=,{4})is a positive lookbehind which checks whether.*is after 4 commas. This requires perl regex, which is whyperl = TRUEis required for the second Base R example.
回答3:
Read the vector as a csv, then refer to the column. To get the last one without knowing how many original columns there are, we can reverse it and take the first.
rev(read.table(text = x, sep = ","))[[1]]
# [1] 88.22 92.53 92.00 92.85
Data:
x <- scan(text='"9/14/2007,,,,88.22" "9/21/2007,,,,92.53" "9/28/2007,,,,92" "10/5/2007,,,,92.85"', what="")
来源:https://stackoverflow.com/questions/46274010/delete-parts-of-a-character-vector