问题
Pretend I have a vector:
testVector <- c("I have 10 cars", "6 cars", "You have 4 cars", "15 cars")
Is there a way to go about parsing this vector, so I can store just the numerical values:
10, 6, 4, 15
If the problem were just "15 cars" and "6 cars", I know how to parse that, but I'm having difficulty with the strings that have text in front too! Any help is greatly appreciated.
回答1:
We can use str_extract
with pattern \\d+
which means to match one or more numbers. It can be otherwise written as [0-9]+
.
library(stringr)
as.numeric(str_extract(testVector, "\\d+"))
#[1] 10 6 4 15
If there are multiple numbers in a string, we use str_extract_all
which wil1 return a list
output.
This can be also done with base R
(no external packages used)
as.numeric(regmatches(testVector, regexpr("\\d+", testVector)))
#[1] 10 6 4 15
Or using gsub
from base R
as.numeric(gsub("\\D+", "", testVector))
#[1] 10 6 4 15
BTW, some functions are just using the gsub
, from extract_numeric
function (x)
{
as.numeric(gsub("[^0-9.-]+", "", as.character(x)))
}
So, if we need a function, we can create one (without using any external packages)
ext_num <- function(x) {
as.numeric(gsub("\\D+", "", x))
}
ext_num(testVector)
#[1] 10 6 4 15
回答2:
For this particular common task, there's a nice helper function in tidyr
called extract_numeric
:
library(tidyr)
extract_numeric(testVector)
## [1] 10 6 4 15
回答3:
This might also come in handy .
testVector <- gsub("[:A-z:]","",testVector)
testVector <- gsub(" ","",testVector)
> testVector
[1] "10" "6" "4" "15"
来源:https://stackoverflow.com/questions/38711057/extracting-a-number-of-a-string-of-varying-lengths