Extract last 4-digit number from a series in R using stringr

一世执手 提交于 2019-11-27 14:51:19

The stringi package has convenient functions that operate on specific parts of a string. So you can find the last occurrence of four consecutive digits with the following.

library(stringi)

x <- c("2005-", "2003-", "1984-1992, 1996-")

stri_extract_last_regex(x, "\\d{4}")
# [1] "2005" "2003" "1996"

Other ways to get the same result are

stri_sub(x, stri_locate_last_regex(x, "\\d{4}"))
# [1] "2005" "2003" "1996"

## or, since these count as words
stri_extract_last_words(x)
# [1] "2005" "2003" "1996"

## or if you prefer a matrix result
stri_match_last_regex(x, "\\d{4}")
#      [,1]  
# [1,] "2005"
# [2,] "2003"
# [3,] "1996"

You can use base R sub for this quite easily:

sub('.*(\\d{4}).*', '\\1', years1)

## [1] "2005" "2003" "1996"

The pattern to be matched here is .* (zero or more of any character) followed by \\d{4} (four consecutive numerals, which we capture by enclosing in parentheses), followed by zero or more characters.

sub replaces the matched pattern with the value in the second argument. In this case, \\1 indicates that we want to replace the whole matched pattern with the first captured substring (i.e. the four consecutive numerals).

Here regex is greedy, so it will bypass early matches of \\d{4}, consuming them with .*. Only the last sequence of four consecutive numerals is captured.

The end of string $ anchor asserts the position at the end of the string.

Saying, match exactly four digits at the end of the string. Unfortunately, what happens is that the digits try to get matched then the regex engine advances trying to assert that position and fails because there not at this position and consecutively backtracks trying to match them.

To fix this, you can greedily consume all characters until the last set of digits.

years1 <- c('2005-', '2003-', '1984-1992, 1996-')
unlist(str_extract_all(years1, perl('.*\\K\\d{4}')))
# [1] "2005" "2003" "1996"
\\d{4}[^\\d]*$

Try this.This should do it for you.See demo.

https://regex101.com/r/kG5pN6/2

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!