I have this function to parse bbcode -> html:
$this->text = preg_replace(array(
\'/\\[b\\](.*?)\\[\\/b\\]/ms\',
\'/\\[i\\](.*?)\\[\\/i\\]/ms\',
Don't.
Instead, store both the original unparsed text and the processed parsed text. Yes, this doubles the storage requirement, but it also makes it blindingly easy to:
If you know exactly that the HTML code you want to de-bbcode was en-bbcoded using your method, than do the following:
Switch the two array you pass to preg_replace.
In the array with the HTML code, do the following for every element: Prepend # to the string. Append #s. Replace \1 (and \2 aso) with (.*?).
For the array with the bbcodes do thefollowing with every element: Remove / at the beginning and /ms at end. Replace \s with . Remove all \. Remove all ?. Replace the first (.*) in the string with $1 and the second with $2.
This should do. If any problems: Ask ;)
It's pretty safe to say it's nigh impossible to build a reliable way to convert html to bbcode with just a slew of regexes. Use a parser (DOMDocument for instance), remove invalid elements & attributes with xpath's & inspection and then recursively walk it creating a bbcode string on the way (or just ignore invalid tags / attributes on the way).