问题
I have this function to parse bbcode -> html:
$this->text = preg_replace(array(
'/\[b\](.*?)\[\/b\]/ms',
'/\[i\](.*?)\[\/i\]/ms',
'/\[u\](.*?)\[\/u\]/ms',
'/\[img\](.*?)\[\/img\]/ms',
'/\[email\](.*?)\[\/email\]/ms',
'/\[url\="?(.*?)"?\](.*?)\[\/url\]/ms',
'/\[size\="?(.*?)"?\](.*?)\[\/size\]/ms',
'/\[youtube\](.*?)\[\/youtube\]/ms',
'/\[color\="?(.*?)"?\](.*?)\[\/color\]/ms',
'/\[quote](.*?)\[\/quote\]/ms',
'/\[list\=(.*?)\](.*?)\[\/list\]/ms',
'/\[list\](.*?)\[\/list\]/ms',
'/\[\*\]\s?(.*?)\n/ms'
),array(
'<strong>\1</strong>',
'<em>\1</em>',
'<u>\1</u>',
'<img src="\1" alt="\1" />',
'<a href="mailto:\1">\1</a>',
'<a href="\1">\2</a>',
'<span style="font-size:\1%">\2</span>',
'<object width="450" height="350"><param name="movie" value="\1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="\1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="450" height="350"></embed></object>',
'<span style="color:\1">\2</span>',
'<blockquote>\1</blockquote>',
'<ol start="\1">\2</ol>',
'<ul>\1</ul>',
'<li>\1</li>'
),$original);
Problem is, how to unparse this, like html -> bbcode?
My regex skills are poor :(
Thanks.
回答1:
Don't.
Instead, store both the original unparsed text and the processed parsed text. Yes, this doubles the storage requirement, but it also makes it blindingly easy to:
- Allow user edits of the original without parsing the BBCode back out
- Allow quotes of other user posts, again without parsing
- Change the HTML each BBCode generates (just re-parse every post)
- Switch BBCode engines down the line (again, just re-parse every post)
回答2:
It's pretty safe to say it's nigh impossible to build a reliable way to convert html to bbcode with just a slew of regexes. Use a parser (DOMDocument for instance), remove invalid elements & attributes with xpath's & inspection and then recursively walk it creating a bbcode string on the way (or just ignore invalid tags / attributes on the way).
回答3:
If you know exactly that the HTML code you want to de-bbcode was en-bbcoded using your method, than do the following:
Switch the two array you pass to preg_replace
.
In the array with the HTML code, do the following for every element: Prepend #
to the string. Append #s
. Replace \1
(and \2
aso) with (.*?)
.
For the array with the bbcodes do thefollowing with every element: Remove /
at the beginning and /ms
at end. Replace \s
with . Remove all
\
. Remove all ?
. Replace the first (.*)
in the string with $1
and the second with $2
.
This should do. If any problems: Ask ;)
来源:https://stackoverflow.com/questions/3272439/bbcode-unparser-regex-help