Negative lookahead in regex to exclude percentage (%) in R

问题

I wish to extract numbers with any decimals (at least one number both sides of the decimal), but not patterns followed by percentages. Therefore, I believe I need a negative lookahead (so it can see if the number is followed by a percentage sign).

For clarity, I would want to extract "123.123", but would not like to extract "123.123%"

I have tried a dozen syntax arrangements but cannot find the one that works. This successfully extracts the decimal pattern.

c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+")

But I want to adapt it to return the second item only (since the first contains a percentage sign.

I have tried various combinations of the following:

c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+(!?=%)")
c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+[!?%]")
c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+!?%")
c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+!?\\%")
c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+(!?=\\%)")
# etc

回答1:

You may use

"\\d+\\.\\d++(?!%)"

The \d++(?!%) part matches 1 or more digits possessively and the (?!%) negative lookahead is executed once after all those digits are matched and fails the match if there is a % after them.

The same can be written without a possessive quantifier as "\\d+\\.\\d+(?![%\\d])", where the (?![%\\d]) will also fail the match if there is a digit immediately to the right of the current location.

R demo:

> library(stringr)
> c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d++(?!%)")
[[1]]
character(0)

[[2]]
[1] "123.123"

回答2:

Are we allowed to just use a stop character, if there is nothing else that can follow the number we may be okay.

c("123.123%", "123.123") %>% str_extract_all(., "\\d+\\.\\d+$")

[[1]] character(0)

[[2]] [1] "123.123"

回答3:

We can fix with adding the ^ and $ at the beginning and end of the string in pattern

c("123.123%", "123.123") %>% 
      str_extract_all(., "^[0-9]+\\.[0-9]+$")

来源：https://stackoverflow.com/questions/54552393/negative-lookahead-in-regex-to-exclude-percentage-in-r

标签

regex

stringr