Regex filter for STL messages

对着背影说爱祢 提交于 2020-01-16 05:16:06

问题


Given the follow STL error:

./poly_power/poly_class.cpp:496: error: no matching function for call to ‘state_operator< polynomial< variable_term< polynomial< variable_term< std::basic_string<char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term<std::basic_string< char, std::char_traits< char>, std::allocator<char> >, int> > > >, polynomial<variable_term< polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> > > > >::operation( std::pair< const std::basic_string< char, std::char_traits< char>, std::allocator< char> >, state_vector_term< polynomial< variable_term< polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> > > >, polynomial< variable_term< polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> > > > > >&)’

how can I use a reg-ex expression to simplify it to something that looks like:

./poly_power/poly_class.cpp:496: error: no matching function for call to ‘state_operator<...>::operation( std::pair<...>&)’

ie. Convert everything within the outer pair of <> into .... I am aware of STLFilt, a perl script that does something similar, but I think it would be pedagogic to see how this could be done in pure regex.

Bonus

Paraitimze the expression so that it works on the nth level of <>. The 1st level would be example above, while the 2nd level would show something like state_operator<polynomial<...>, polynomial<...> >.


回答1:


What you have here is an irregular grammar that is near-impossible to match with a regular expression.

Edit:

I have experimented around a bit, and, well, it is possible (provided that angle brackets occur nowhere else in the string, that they are balanced (one closing bracket for each opening one), and that you can define an upper limit to the nesting involved):

Match <...> (without nesting):

<[^<>]*+>

Match <...<...>...<...>...> (one level of nesting):

<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+[^<>]*+>

Match with up to two levels of nesting:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

Up to three levels:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

Up to four levels:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

Up to five levels:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

etc.

To get the bonus point, here's the last one broken up into a verbose regex, so you can see a bit easier how it is constructed:

<             # Match the first opening <
[^<>]*+       # Match any non-<>-characters possessively
  (?:         # Match the following zero or more times:
  <             # Match a <
  [^<>]*+       # etc. etc. etc.
    (?:
    <
    [^<>]*+
      (?:
      <
      [^<>]*+
        (?:
        <
        [^<>]*+
          (?:          # innermost level:
          <            # Match a <
          [^<>]*+      # Match any non-<> characters
          >            # Match a >
          [^<>]*+      # Match any non-<> characters
          )*+          # any number of times, possessively
        >            # then back one level: Match a >
        [^<>]*+      # etc. etc. etc.
        )*+
      >
      [^<>]*+
      )*+
    >
    [^<>]*+
    )*+
  >
  [^<>]*+
  )*+
[^<>]*+
>               # Match the final closing >

Isn't this perfectly horrible?

For your example, it seems you need seven levels of nesting. If you want a quick regex constructor, it could look like this (in Python):

def innerregex(nesting):
    if nesting == 0:
        return ""
    else:
        return "(?:<[^<>]*+" + innerregex(nesting-1) + ">[^<>]*+)*+"

def makeregex(nesting):
    return "<[^<>]*+" + innerregex(nesting) + ">"

makeregex(2) will return a regex that can match <<<><>>> correctly. makeregex(7) should work on the entire string, makeregex(6) will only take the innermost matches, and so on.



来源:https://stackoverflow.com/questions/3856790/regex-filter-for-stl-messages

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!