Get contents of brackets using regex in a list of values

孤街醉人 提交于 2019-12-23 01:55:09

问题


I'm trying to look for a regex (Coldfusion or Java) that can get me the contents between the brackets for each (param \d+) without fail. I've tried dozens of different types of regexes and the closest one I got is this one:

\(param \d+\) = \[(type='[^']*', class='[^']*', value='(?:[^']|'')*', sqltype='[^']*')\]

Which would be perfect, if the string that I get back from CF escaped single quotes from the value parameter. But it doesn't so it fails miserably. Going the route of a negative lookahead like so:

\[(type='[^']*', class='[^']*', value='(?:(?!', sqltype).)*', sqltype='[^']*')\]

Is great, unless for some unnatured reason there's a piece of code that quite literally has , sqltype in the value. I find it hard to believe I can't simply tell regex to scoop out the contents of every open and closed bracket it finds but then again, I don't know enough regex to know its limits.

Here's an example string of what I'm trying to parse:

(param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar']

For the curious this is a sub-question to Copyable Coldfusion SQL Exception.

EDIT

This is my attempt at implementing @Mena's answer in CF9.1. Sadly it doesn't finish processing the string. I had to replace the \\ with \ just to get it to run at first, but my implementation might still be at fault.

This is the string given (pipes are just to denote boundary):

| (param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly], really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype ', sqltype='cf_sql_varchar'] | 

This is my implementation:

    <cfset var outerPat = createObject("java","java.util.regex.Pattern").compile(javaCast("string", "\((.+?)\)\s?\=\s?\[(.+?)\](\s?,|$)"))>
    <cfset var innerPat = createObject("java","java.util.regex.Pattern").compile(javaCast("string", "(.+?)\s?\=\s?'(.+?)'\s?,\s?"))>
    <cfset var outerMatcher = outerPat.matcher(javaCast("string", arguments.params))>

    <cfdump var="Start"><br />
    <cfloop condition="outerMatcher.find()">     
        <cfdump var="#outerMatcher.group(1)#"> (<cfdump var="#outerMatcher.group(2)#">)<br />
        <cfset var innerMatcher = innerPat.matcher(javaCast("string", outerMatcher.group(2)))>
        <cfloop condition="innerMatcher.find()">
            <cfoutput>|__</cfoutput><cfdump var="#innerMatcher.group(1)#"> --> <cfdump var="#innerMatcher.group(2)#"><br />
        </cfloop>
        <br />
    </cfloop>
    <cfabort>

And this is what printed:

Start 
param 1 ( type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer' )
|__ type --> IN 
|__ class --> java.lang.Integer 
|__ value --> 47 

param 2 ( type='IN', class='java.lang.String', value='asf , O'Reilly )
|__ type --> IN 
|__ class --> java.lang.String 

End

回答1:


Here's a Java regex pattern that works for your sample input.

(?x)

# lookbehind to check for start of string or previous param
# java lookbehinds must have max length, so limits sqltype
(?<=^|sqltype='cf_sql_[a-z]{1,16}']\ ,\ )

# capture the full string for replacing in the orig sql
# and just the position to verify against the match position
(\(param\ (\d+)\))

\ =\ \[

# type and class wont contain quotes
   type='([^']++)'
,\ class='([^']++)'

# match any non-quote, then lazily keep going
,\ value='([^']++.*?)'

# sqltype is always alphanumeric
,\ sqltype='cf_sql_[a-z]+'

\]

# lookahead to check for end of string or next param
(?=$|\ ,\ \(param\ \d+\)\ =\ \[)

(The (?x) flag is for comment mode, which ignores unescaped whitespace and between a hash and end of line.)

And here's that pattern implemented in CFML (tested on CF9,0,1,274733). It uses cfRegex (a library which makes it easier to work with Java regex in CFML) to get the results of that pattern, and then does a couple of checks to make sure the expected number of params are found.

<cfsavecontent variable="Input">
(param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer']
 , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar']
 , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar']
</cfsavecontent>
<cfset Input = trim(Input).replaceall('\n','')>

<cfset cfcatch = 
    { params = input
    , sql = 'SELECT stuff FROM wherever WHERE (param 3) is last param'
    }/>

<cfsavecontent variable="ParamRx">(?x)

    # lookbehind to check for start or previous param
    # java lookbehinds must have max length, so limits sqltype
    (?<=^|sqltype='cf_sql_[a-z]{1,16}']\ ,\ )

    # capture the full string for replacing in the orig sql
    # and just the position to verify against the match position
    (\(param\ (\d+)\))

    \ =\ \[

    # type and class wont contain quotes
       type='([^']++)'
    ,\ class='([^']++)'

    # match any non-quote, then lazily keep going if needed
    ,\ value='([^']++.*?)'

    # sqltype is always alphanumeric
    ,\ sqltype='cf_sql_[a-z]+'

    \]

    # lookahead to check for end or next param
    (?=$|\ ,\ \(param\ \d+\)\ =\ \[)

</cfsavecontent>

<cfset FoundParams = new Regex(ParamRx).match
    ( text = cfcatch.params
    , returntype = 'full'
    )/>

<cfset LastParamPos = cfcatch.sql.lastIndexOf('(param ') + 7 />
<cfset LastParam = ListFirst( Mid(cfcatch.sql,LastParamPos,3) , ')' ) />

<cfif LastParam NEQ ArrayLen(FoundParams) >
    <cfset ProblemsDetected = true />
<cfelse>
    <cfset ProblemsDetected = false />

    <cfloop index="i" from=1 to=#ArrayLen(FoundParams)# >

        <cfif i NEQ FoundParams[i].Groups[2] >
            <cfset ProblemsDetected = true />
        </cfif>

    </cfloop>
</cfif>

<cfif ProblemsDetected>
    <big>Something went wrong!</big>
<cfelse>
    <big>All seems fine</big>
</cfif>

<cfdump var=#FoundParams# />

This will actually work if you embed an entire param inside the value of another param. It fails if you try two (or more), but at least least the checks should detect this failure.

Here's what the dump output should look like:

Hopefully everything here makes sense - let me know if any questions.




回答2:


I would probably use a dedicated parser for that, but here's an example on how to do it with two Patterns and nested loops:

// the input String
String input = "(param 1) = " +
        "[type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , " +
        "(param 2) = " +
        "[type='IN', class='java.lang.String', value='asf , O'Reilly, really?', " +
        "sqltype='cf_sql_varchar'] , " +
        "(param 3) = " +
        "[type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , "                "[]can break it ', sqltype= ', sqltype='cf_sql_varchar']";

// the Pattern defining the round-bracket expression and the following 
// square-bracket list. Both values within the brackets are grouped for back-reference
// note that what prevents the 3rd case from breaking is that the closing square bracket 
// is expected to be either followed by optional space + comma, or end of input
Pattern outer = Pattern.compile("\\((.+?)\\)\\s?\\=\\s?\\[(.+?)\\](\\s?,|$)");

// the Pattern defining the key-value pairs within the square-bracket groups
// note that both key and value are grouped for back-reference
Pattern inner = Pattern.compile("(.+?)\\s?\\=\\s?'(.+?)'\\s?,\\s?");
Matcher outerMatcher = outer.matcher(input);
// iterating over the outer Pattern (type x) = [myKey = myValue, ad lib.], or end of input
while (outerMatcher.find()) {
    System.out.println(outerMatcher.group(1));
    Matcher innerMatcher = inner.matcher(outerMatcher.group(2));
    // iterating over the inner Pattern myKey = myValue
    while (innerMatcher.find()) {
        System.out.println("\t" + innerMatcher.group(1) + " --> " + innerMatcher.group(2));
    }
}

Output:

param 1
    type --> IN
    class --> java.lang.Integer
    value --> 47
param 2
    type --> IN
    class --> java.lang.String
    value --> asf , O'Reilly, really?
param 3
    type --> IN
    class --> java.lang.String
    value --> Th[is]is Ev'ery'thing That , []can break it 


来源:https://stackoverflow.com/questions/18152435/get-contents-of-brackets-using-regex-in-a-list-of-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!