Javascript regex pattern match multiple strings ( AND, OR ) against single string

醉酒当歌 提交于 2019-12-22 06:35:32

问题


I need to filter a collection of strings based on a rather complex query - in it's "raw" form it looks like this:

nano* AND (regulat* OR *toxic* OR ((risk OR hazard) AND (exposure OR release)) )

An example of one of the strings to match against:

Workshop on the Second Regulatory Review on Nanomaterials, 30 January 2013, Brussels

So, I need to match using AND OR and wildcard characters - so, I presume I'll need to use a regex in JavaScript.

I have it all looping correctly, filtering and generally working, but I'm 100% sure my regex is wrong - and some results are being omitted wrongly - here it is:

/(nano[a-zA-Z])?(regulat[a-zA-Z]|[a-zA-Z]toxic[a-zA-Z]|((risk|hazard)*(exposure|release)))/i

Any help would be greatly appreciated - I really can't abstract my mind correctly to understand this syntax!

UPDATE:

Few people are point out the importance of the order in which the regex is constructed, however I have no control over the text strings that will be searched, so I need to find a solution that can work regardless of the order or either.

UPDATE:

Eventually used a PHP solution, due to deprecation of twitter API 1.0, see pastebin for example function ( I know it's better to paste code here, but there's a lot... ):

function: http://pastebin.com/MpWSGtHK usage: http://pastebin.com/pP2AHEvk

Thanks for all help


回答1:


A single regex is not the right tool for this, IMO:

/^(?=.*\bnano)(?=(?:.*\bregulat|.*toxic|(?=.*(?:\brisk\b|\bhazard\b))(?=.*(?:\bexposure\b|\brelease\b))))/i.test(subject))

would return True if the string fulfills the criteria you set forth, but I find nested lookaheads quite incomprehensible. If JavaScript supported commented regexes, it would look like this:

^                 # Anchor search to start of string
(?=.*\bnano)      # Assert that the string contains a word that starts with nano
(?=               # AND assert that the string contains...
 (?:              #  either
  .*\bregulat     #   a word starting with regulat
 |                #  OR
  .*toxic         #   any word containing toxic
 |                #  OR
  (?=             #   assert that the string contains
   .*             #    any string
   (?:            #    followed by
    \brisk\b      #    the word risk
   |              #    OR
    \bhazard\b    #    the word hazard
   )              #    (end of inner OR alternation)
  )               #   (end of first AND condition)
  (?=             #   AND assert that the string contains
   .*             #    any string
   (?:            #    followed by
    \bexposure\b  #    the word exposure
   |              #    OR
    \brelease\b   #    the word release
   )              #    (end of inner OR alternation)
  )               #   (end of second AND condition)
 )                #  (end of outer OR alternation)
)                 # (end of lookahead assertion)

Note that the entire regex is composed of lookahead assertions, so the match result itself will always be the empty string.

Instead, you could use single regexes:

if (/\bnano/i.test(str) &&
    ( 
        /\bregulat|toxic/i.test(str) ||
        ( 
            /\b(?:risk|hazard)\b/i.test(str) &&
            /\b(?:exposure|release)\b/i.test(str)
        )
    )
)    /* all tests pass */



回答2:


Regular expressions have to move through the string in order. You have "nano" before "regulat" in the pattern, but they are swapped in the test string. Instead of using regexen to do this, I'd stick with plain old string parsing:

if (str.indexOf('nano') > -1) {
    if (str.indexOf('regulat') > -1 || str.indexOf('toxic') > -1
        || ((str.indexOf('risk') > - 1 || str.indexOf('hazard') > -1)
        && (str.indexOf('exposure') > -1 || str.indexOf('release') > -1)
    )) {
        /* all tests pass */
    }
}

If you want to actually capture the words (e.g. get "Regulatory" from where "regulat" is, I would split the sentence by word breaks and inspect individual words.



来源:https://stackoverflow.com/questions/15090829/javascript-regex-pattern-match-multiple-strings-and-or-against-single-strin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!