Matching brackets in a string

后端 未结 9 1608
小蘑菇
小蘑菇 2020-12-03 03:47

What is the most efficient or elegant method for matching brackets in a string such as:

\"f @ g[h[[i[[j[2], k[[1, m[[1, n[2]]]]]]]]]] // z\"
<
相关标签:
9条回答
  • 2020-12-03 04:24

    I can offer a heavy approach (not too elegant). Below is my implementation of the bare-bones Mathematica parser (it will only work for strings containing Fullform of the code, with the possible exception for double brackets - which I will use here), based on rather general functionality of breadth-first parser that I developed mostly to implement an HTML parser:

    ClearAll[listSplit, reconstructIntervals, groupElements, 
    groupPositions, processPosList, groupElementsNested];
    
    listSplit[x_List, lengthlist_List, headlist_List] := 
      MapThread[#1 @@ Take[x, #2] &, {headlist, 
        Transpose[{Most[#] + 1, Rest[#]} &[
          FoldList[Plus, 0, lengthlist]]]}];
    
    reconstructIntervals[listlen_Integer, ints_List] := 
      Module[{missed, startint, lastint},
        startint  = If[ints[[1, 1]] == 1, {}, {1, ints[[1, 1]] - 1}];
        lastint = 
           If[ints[[-1, -1]] == listlen, {}, {ints[[-1, -1]] + 1, listlen}];
        missed = 
          Map[If[#[[2, 1]] - #[[1, 2]] > 1, {#[[1, 2]] + 1, #[[2, 1]] - 1}, {}] &, 
          Partition[ints, 2, 1]];
        missed = Join[missed, {lastint}];
        Prepend[Flatten[Transpose[{ints, missed}], 1], startint]];
    
    groupElements[lst_List, poslist_List, headlist_List] /; 
     And[OrderedQ[Flatten[Sort[poslist]]], Length[headlist] == Length[poslist]] := 
      Module[{totalheadlist, allints, llist},
        totalheadlist = 
         Append[Flatten[Transpose[{Array[Sequence &, {Length[headlist]}], headlist}], 1], Sequence];
      allints = reconstructIntervals[Length[lst], poslist];
      llist = Map[If[# === {}, 0, 1 - Subtract @@ #] &, allints];
      listSplit[lst, llist, totalheadlist]];
    
      (* To work on general heads, we need this *)
    
    groupElements[h_[x__], poslist_List, headlist_List] := 
       h[Sequence @@ groupElements[{x}, poslist, headlist]];
    
    (* If we have a single head *)
    groupElements[expr_, poslist_List, head_] := 
        groupElements[expr, poslist, Table[head, {Length[poslist]}]];
    
    
    groupPositions[plist_List] :=
         Reap[Sow[Last[#], {Most[#]}] & /@ plist, _, List][[2]];
    
    
    processPosList[{openlist_List, closelist_List}] :=
       Module[{opengroup, closegroup, poslist},
        {opengroup, closegroup} = groupPositions /@ {openlist, closelist} ;
        poslist =  Transpose[Transpose[Sort[#]] & /@ {opengroup, closegroup}];
        If[UnsameQ @@ poslist[[1]],
           Return[(Print["Unmatched lists!", {openlist, closelist}]; {})],
           poslist = Transpose[{poslist[[1, 1]], Transpose /@ Transpose[poslist[[2]]]}]
        ]
    ];
    
    groupElementsNested[nested_, {openposlist_List, closeposlist_List}, head_] /; Head[head] =!= List := 
     Fold[
      Function[{x, y}, 
        MapAt[groupElements[#, y[[2]], head] &, x, {y[[1]]}]], 
      nested, 
      Sort[processPosList[{openposlist, closeposlist}], 
       Length[#2[[1]]] < Length[#1[[1]]] &]];
    
    ClearAll[parse, parsedToCode, tokenize, Bracket ];
    
    (* "tokenize" our string *)
    tokenize[code_String] := 
     Module[{n = 0, tokenrules},
       tokenrules = {"[" :> {"Open", ++n}, "]" :> {"Close", n--}, 
           Whitespace | "" ~~ "," ~~ Whitespace | ""};
       DeleteCases[StringSplit[code, tokenrules], "", Infinity]];
    
    (* parses the "tokenized" string in the breadth-first manner starting 
       with the outermost brackets, using Fold and  groupElementsNested*)
    
    parse[preparsed_] := 
      Module[{maxdepth = Max[Cases[preparsed, _Integer, Infinity]], 
       popenlist, parsed, bracketPositions},
       bracketPositions[expr_, brdepth_Integer] := {Position[expr, {"Open", brdepth}], 
       Position[expr, {"Close", brdepth}]};  
       parsed = Fold[groupElementsNested[#1, bracketPositions[#1, #2], Bracket] &,
                   preparsed, Range[maxdepth]];
       parsed =  DeleteCases[parsed, {"Open" | "Close", _}, Infinity];
       parsed = parsed //. h_[x___, y_, Bracket[z___], t___] :> h[x, y[z], t]];
    
     (* convert our parsed expression into a code that Mathematica can execute *)
     parsedToCode[parsed_] :=
     Module[{myHold},
       SetAttributes[myHold, HoldAll];   
       Hold[Evaluate[
         MapAll[# //. x_String :> ToExpression[x, InputForm, myHold] &, parsed] /.
          HoldPattern[Sequence[x__][y__]] :> x[y]]] //. myHold[x___] :> x
    
     ];
    

    (note the use of MapAll in the last function). Now, here is how you can use it :)

    In[27]:= parsed = parse[tokenize["f[g[h[[i[[j[2], k[[1, m[[1, n[2]]]]]]]]]]]"]]
    
    Out[27]= {"f"["g"["h"[Bracket[
     "i"[Bracket["j"["2"], 
       "k"[Bracket["1", "m"[Bracket["1", "n"["2"]]]]]]]]]]]}
    
    In[28]:= parsed //. a_[Bracket[b__]] :> "Part"[a, b]
    
    
    Out[28]= {"f"["g"["Part"["h", 
    "Part"["i", "j"["2"], 
     "Part"["k", "1", "Part"["m", "1", "n"["2"]]]]]]]}
    

    Now you can use parseToCode:

    In[35]:= parsedToCode[parsed//.a_[Bracket[b__]]:>"Part"[a,b]]//FullForm
    
    Out[35]//FullForm= Hold[List[f[g[Part[h,Part[i,j[2],Part[k,1,Part[m,1,n[2]]]]]]]]]
    

    EDIT

    Here is an addition needed to make only the character-replacement, as requested:

    Clear[stringify, part, parsedToString];
    stringify[x_String] := x;
    stringify[part[open_, x___, close_]] := 
       part[open, Sequence @@ Riffle[Map[stringify, {x}], ","], close];
    stringify[f_String[x___]] := {f, "[",Sequence @@ Riffle[Map[stringify, {x}], ","], "]"};
    
    parsedToString[parsed_] := 
     StringJoin @@ Flatten[Apply[stringify, 
      parsed //. Bracket[x__] :> part["yourOpenChar", x, "yourCloseChar"]] //. 
        part[x__] :> x];
    

    Here is how we can use it:

    In[70]:= parsedToString[parsed]
    
    Out[70]= "f[g[h[yourOpenChari[yourOpenCharj[2],k[yourOpenChar1,m[\
      yourOpenChar1,n[2]yourCloseChar]yourCloseChar]yourCloseChar]\
       yourCloseChar]]]"
    
    0 讨论(0)
  • 2020-12-03 04:25

    Here is my attempt. The pasted ASCII code is pretty unreadable due to the presence of special characters so I first provide a picture of how it looks in MMA.

    Basically what it does is this: Opening brackets are always uniquely identifiable as single or double. The problem lies in the closing brackets. Opening brackets always have the pattern string-of-characters-containing-no-brackets + [ or [[. It is impossible to have either a [ following a [[ or vice versa without other characters in-between (at least, not in error-free code).

    So, we use this as a hook and start looking for certain pairs of matching brackets, namely the ones that don't have any other brackets in-between. Since we know the type, either "[... ]" or "[[...]]", we can replace the latter ones with the double-bracket symbols and the former one with unused characters (I use smileys). This is done so they won't play a role anymore in the next iteration of the pattern matching process.

    We repeat until all brackets are processed and finally the smileys are converted to single brackets again.

    You see, the explanation takes mores characters than the code does ;-).

    Ascii:

    s = "f @ g[hh[[i[[jj[2], k[[1, m[[1, n[2]]]]]]]]]] // z";
    
    myRep[s_String] :=
     StringReplace[s,
      {
       Longest[y : Except["[" | "]"] ..] ~~ "[" ~~ 
         Longest[x : Except["[" | "]"] ..] ~~ "]" :> 
        y <> "\[HappySmiley]" <> x <> "\[SadSmiley]",
       Longest[y : Except["[" | "]"] ..] ~~ "[" ~~ Whitespace ... ~~ "[" ~~
          Longest[x : Except["[" | "]"] ..] ~~ "]" ~~ Whitespace ... ~~ 
         "]" :> y <> "\[LeftDoubleBracket]" <> x <> "\[RightDoubleBracket]"
       }
      ]
    
    StringReplace[FixedPoint[myRep, s], {"\[HappySmiley]" -> "[","\[SadSmiley]" -> "]"}]
    

    Oh, and the Whitespace part is because in Mathematica double brackets need not be next to each other. a[ [1] ] is just as legal as is a[[1]].

    0 讨论(0)
  • 2020-12-03 04:25

    Edit

    tl;dr version:

    I'm on track for inadvertently solving the base problem, but regular expressions can't count brackets so use a stack implementation.

    Longer version:

    My esteemed colleagues are correct, the best way to approach this problem is a stack implementation. Regular expressions may be able to change [[ and ]] into [ and ] respectively if the same number of [[ exist within the string as the number of ]], however if the whole point of the exercise is to use the text within matching [] then regex isn't the way to go. Regular expressions cannot count brackets, nesting logic is just too complex for a simple regex to account for. So in a nutshell I believe that regular expressions can be used to address the basic requirement, which was to change matching [[]] into matching [], however you should really be using a stack because it allows easier manipulation of the resultant string.

    And sorry, I completely missed the mathematica tag! I'll leave my answer in here though just in case someone gets excited and jumps the gun like I did.

    End Edit

    A regular expression utilising reluctant quantifiers should be able to progressively determine where [[ and ]] tokens are in a String, and ensure that matches are only made if the number of [[ equals the number of ]].

    The required regex would be along the lines of [[{1}?(?!]])*?]]{1}?, which in plain English is:

    • [[{1}?, progress one character at a time from the start of the string until one instance of [[ is encountered
    • (?!]])*? if any characters exist that don't match ]], progress through them one at a time
    • ]]{1}? match the closing bracket

    To change the double-square-brackets into single-square-brackets, identify groups within the regex by adding brackets around the first and third particles:

    ([[{1}?)(?!]])*?(]]{1}?)
    

    This allows you to select the [[ and ]] tokens, and then replace them with [ or ].

    0 讨论(0)
提交回复
热议问题