Negative lookahead in regex to exclude percentage (%) in R

99封情书 提交于 2021-01-27 18:00:30

问题


I wish to extract numbers with any decimals (at least one number both sides of the decimal), but not patterns followed by percentages. Therefore, I believe I need a negative lookahead (so it can see if the number is followed by a percentage sign).

For clarity, I would want to extract "123.123", but would not like to extract "123.123%"

I have tried a dozen syntax arrangements but cannot find the one that works. This successfully extracts the decimal pattern.

c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+")

But I want to adapt it to return the second item only (since the first contains a percentage sign.

I have tried various combinations of the following:

c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+(!?=%)")
c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+[!?%]")
c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+!?%")
c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+!?\\%")
c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+(!?=\\%)")
# etc

回答1:


You may use

"\\d+\\.\\d++(?!%)"

The \d++(?!%) part matches 1 or more digits possessively and the (?!%) negative lookahead is executed once after all those digits are matched and fails the match if there is a % after them.

The same can be written without a possessive quantifier as "\\d+\\.\\d+(?![%\\d])", where the (?![%\\d]) will also fail the match if there is a digit immediately to the right of the current location.

R demo:

> library(stringr)
> c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d++(?!%)")
[[1]]
character(0)

[[2]]
[1] "123.123"



回答2:


Are we allowed to just use a stop character, if there is nothing else that can follow the number we may be okay.

c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+$")

[[1]] character(0)

[[2]] [1] "123.123"




回答3:


We can fix with adding the ^ and $ at the beginning and end of the string in pattern

c("123.123%", "123.123") %>% 
      str_extract_all(., "^[0-9]+\\.[0-9]+$")


来源:https://stackoverflow.com/questions/54552393/negative-lookahead-in-regex-to-exclude-percentage-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!