bbcode unparser regex help

孤人 提交于 2019-12-19 04:19:05

问题


I have this function to parse bbcode -> html:

  $this->text = preg_replace(array(
    '/\[b\](.*?)\[\/b\]/ms', 
    '/\[i\](.*?)\[\/i\]/ms',
    '/\[u\](.*?)\[\/u\]/ms',
    '/\[img\](.*?)\[\/img\]/ms',
    '/\[email\](.*?)\[\/email\]/ms',
    '/\[url\="?(.*?)"?\](.*?)\[\/url\]/ms',
    '/\[size\="?(.*?)"?\](.*?)\[\/size\]/ms',
    '/\[youtube\](.*?)\[\/youtube\]/ms',
    '/\[color\="?(.*?)"?\](.*?)\[\/color\]/ms',    
    '/\[quote](.*?)\[\/quote\]/ms',
    '/\[list\=(.*?)\](.*?)\[\/list\]/ms',
    '/\[list\](.*?)\[\/list\]/ms',
    '/\[\*\]\s?(.*?)\n/ms'
   ),array(
    '<strong>\1</strong>',
    '<em>\1</em>',
    '<u>\1</u>',
    '<img src="\1" alt="\1" />',
    '<a href="mailto:\1">\1</a>',
    '<a href="\1">\2</a>',
    '<span style="font-size:\1%">\2</span>',
    '<object width="450" height="350"><param name="movie" value="\1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="\1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="450" height="350"></embed></object>',
    '<span style="color:\1">\2</span>',
    '<blockquote>\1</blockquote>',
    '<ol start="\1">\2</ol>',
    '<ul>\1</ul>',
    '<li>\1</li>'
   ),$original);

Problem is, how to unparse this, like html -> bbcode?

My regex skills are poor :(

Thanks.


回答1:


Don't.

Instead, store both the original unparsed text and the processed parsed text. Yes, this doubles the storage requirement, but it also makes it blindingly easy to:

  1. Allow user edits of the original without parsing the BBCode back out
  2. Allow quotes of other user posts, again without parsing
  3. Change the HTML each BBCode generates (just re-parse every post)
  4. Switch BBCode engines down the line (again, just re-parse every post)



回答2:


It's pretty safe to say it's nigh impossible to build a reliable way to convert html to bbcode with just a slew of regexes. Use a parser (DOMDocument for instance), remove invalid elements & attributes with xpath's & inspection and then recursively walk it creating a bbcode string on the way (or just ignore invalid tags / attributes on the way).




回答3:


If you know exactly that the HTML code you want to de-bbcode was en-bbcoded using your method, than do the following:

Switch the two array you pass to preg_replace.

In the array with the HTML code, do the following for every element: Prepend # to the string. Append #s. Replace \1 (and \2 aso) with (.*?).

For the array with the bbcodes do thefollowing with every element: Remove / at the beginning and /ms at end. Replace \s with . Remove all \. Remove all ?. Replace the first (.*) in the string with $1 and the second with $2.

This should do. If any problems: Ask ;)



来源:https://stackoverflow.com/questions/3272439/bbcode-unparser-regex-help

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!