Regex to match top level delimiters in a multi dimensional string

前端 未结 3 1218
忘掉有多难
忘掉有多难 2020-12-22 07:13

I have a file that is structured in a large multidimensional structure, similar to json, but not close enough for me to use a json library.

The data looks something

相关标签:
3条回答
  • 2020-12-22 07:18

    I think you might get something using preg_split by matching [a-zA-Z0-9][:blank]+{ and }. You'll be able to construct your array by going through the result. Use a recursive function which goes deeper when you match an opening tag, and upper on a closing tag.

    Otherwise, cleanest solution would be to implement an ANTLR grammar !

    0 讨论(0)
  • 2020-12-22 07:21

    Sure you can do this with regular expressions.

    preg_match_all(
        '/([^\s]+)\s*{((?:[^{}]*|(?R))*)}/',
        $yourStuff,
        $matches,
        PREG_SET_ORDER
    );
    

    This gives me the following in matches:

    [1]=>
    string(5) "alpha"
    [2]=>
    string(46) "
    beta {
        charlie;
    }
    delta;
    "
    

    and

    [1]=>
    string(7) "foxtrot"
    [2]=>
    string(22) "
    golf;
    hotel;
    "
    

    Breaking it down a little bit.

    ([^\s]+)                # non-whitespace (block name)
    \s*                     # whitespace (between name and block)
    {                       # literal brace
        (                   # begin capture
            (?:             # don't create another capture set
                [^{}]*      # everything not a brace
                |(?R)       # OR recurse
            )*              # none or more times
        )                   # end capture
    }                       # literal brace
    

    Just for your information, this works fine on n-deep levels of braces.

    0 讨论(0)
  • 2020-12-22 07:28

    You can't 1 do this with regular expressions.

    Alternatively, if you want to match deep-to-shallow blocks, you can use \{[^\{\}]*?\} and preg_replace_callback() to store the value, and return null to erase it from the string. The callback will need to take care of nesting the value accordingly.

    $heirarchalStorage = ...;
    do {
        $string = \preg_replace_callback('#\{[^\{\}]*?\}#', function($block)
        use(&$heirarchalStorage) {
            // do your magic with $heirarchalStorage
            // in here
            return null;
        }, $string);
    } while (!empty($string));
    

    Incomplete, not tested, and no warranty.

    This approach requires that the string be wrapped in {} as well, otherwise the final match won't happen and you'll loop forever.

    This is an awful lot of (inefficient) work for something that can just as easily be solved with a well known exchange/storage format such as JSON.

    1 I was going to put "you can, but...", however I'll just say once again, "You can't" 2

    2 Don't

    0 讨论(0)
提交回复
热议问题