Extracting a number of a string of varying lengths [duplicate]

半腔热情 提交于 2019-12-12 01:46:50

问题


Pretend I have a vector:

testVector <- c("I have 10 cars", "6 cars", "You have 4 cars", "15 cars")

Is there a way to go about parsing this vector, so I can store just the numerical values:

10, 6, 4, 15

If the problem were just "15 cars" and "6 cars", I know how to parse that, but I'm having difficulty with the strings that have text in front too! Any help is greatly appreciated.


回答1:


We can use str_extract with pattern \\d+ which means to match one or more numbers. It can be otherwise written as [0-9]+.

library(stringr)
as.numeric(str_extract(testVector, "\\d+"))
#[1] 10  6  4 15

If there are multiple numbers in a string, we use str_extract_all which wil1 return a list output.


This can be also done with base R (no external packages used)

as.numeric(regmatches(testVector, regexpr("\\d+", testVector)))
#[1] 10  6  4 15

Or using gsub from base R

as.numeric(gsub("\\D+", "", testVector))
#[1] 10  6  4 15

BTW, some functions are just using the gsub, from extract_numeric

function (x) 
 {
   as.numeric(gsub("[^0-9.-]+", "", as.character(x)))
 }

So, if we need a function, we can create one (without using any external packages)

ext_num <- function(x) {
             as.numeric(gsub("\\D+", "", x))
         }
ext_num(testVector)
#[1] 10  6  4 15



回答2:


For this particular common task, there's a nice helper function in tidyr called extract_numeric:

library(tidyr)

extract_numeric(testVector)
## [1] 10  6  4 15



回答3:


This might also come in handy .

testVector <- gsub("[:A-z:]","",testVector)
testVector <- gsub(" ","",testVector)

> testVector
[1] "10" "6"  "4"  "15"


来源:https://stackoverflow.com/questions/38711057/extracting-a-number-of-a-string-of-varying-lengths

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!