R: split only when special regex condition doesn't match

泄露秘密 提交于 2019-12-12 11:58:30

问题


How would you split at every and/ERT only when it is not succeded by "/V" inside one word after in:

text <- c("faulty and/ERT something/VBN and/ERT else/VHGB and/ERT as/VVFIN and/ERT not else/VHGB propositions one and/ERT two/CDF and/ERT three/ABC")

# my try - !doesn't work
> strsplit(text, "(?<=and/ERT)\\s(?!./V.)", perl=TRUE)
                                    ^^^^

# Exptected return
[[1]]    
[1] "faulty and/ERT something/VBN and/ERT else/VHGB and/ERT as/VVFIN and/ERT"
[2] "not else/VHGB propositions one and/ERT"
[3] "two/CDF and/ERT"            
[4] "three/ABC"    

回答1:


Actually, you need to approach this in another way:

(?<=and/ERT)\\s(?!\\S+/V)
                  ^^^^

You will need to use \\S+ because using .* will prevent a match even if /V is present two words ahead.

\\S+ matches non spaces by the way.

Lastly, the final period can be safely ignored.

regex101 demo




回答2:


Actually you have made a tiny little mistake but it caused everything not to work:

(?<=and/ERT)\\s(?![^\\s/]+/V)
                  ^^^^^^^
            match one or more characters that are not white space or forward slash /

By the way, the dot . after the /V is not needed.

Edit: I have made some edits according to @smerny's comment and your edit.




回答3:


Try this:

(?<=and/ERT)\\s(?![a-zA-Z]+/V)

The problem was that your /V preceeded and followed by one of anything and your example had more than one character between your space and your /V.

[a-zA-Z]+/V makes sure that the only thing between the space and the /V is a single word consisting of letters. I believe this is your requirement based on your description and examples given.

Demo



来源:https://stackoverflow.com/questions/18719809/r-split-only-when-special-regex-condition-doesnt-match

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!