I have a file that is structured in a large multidimensional structure, similar to json, but not close enough for me to use a json library.
The data looks something
I think you might get something using preg_split by matching [a-zA-Z0-9][:blank]+{ and }. You'll be able to construct your array by going through the result. Use a recursive function which goes deeper when you match an opening tag, and upper on a closing tag.
Otherwise, cleanest solution would be to implement an ANTLR grammar !
Sure you can do this with regular expressions.
preg_match_all(
'/([^\s]+)\s*{((?:[^{}]*|(?R))*)}/',
$yourStuff,
$matches,
PREG_SET_ORDER
);
This gives me the following in matches:
[1]=>
string(5) "alpha"
[2]=>
string(46) "
beta {
charlie;
}
delta;
"
and
[1]=>
string(7) "foxtrot"
[2]=>
string(22) "
golf;
hotel;
"
Breaking it down a little bit.
([^\s]+) # non-whitespace (block name)
\s* # whitespace (between name and block)
{ # literal brace
( # begin capture
(?: # don't create another capture set
[^{}]* # everything not a brace
|(?R) # OR recurse
)* # none or more times
) # end capture
} # literal brace
Just for your information, this works fine on n-deep levels of braces.
You can't 1 do this with regular expressions.
Alternatively, if you want to match deep-to-shallow blocks, you can use \{[^\{\}]*?\} and preg_replace_callback() to store the value, and return null to erase it from the string. The callback will need to take care of nesting the value accordingly.
$heirarchalStorage = ...;
do {
$string = \preg_replace_callback('#\{[^\{\}]*?\}#', function($block)
use(&$heirarchalStorage) {
// do your magic with $heirarchalStorage
// in here
return null;
}, $string);
} while (!empty($string));
Incomplete, not tested, and no warranty.
This approach requires that the string be wrapped in {} as well, otherwise the final match won't happen and you'll loop forever.
This is an awful lot of (inefficient) work for something that can just as easily be solved with a well known exchange/storage format such as JSON.
1 I was going to put "you can, but...", however I'll just say once again, "You can't" 2
2 Don't