Remove MS Word “HTML” using PHP [duplicate]

我们两清 提交于 2019-12-07 05:49:15

问题


Possible Duplicate:
What is the best free way to clean up Word HTML?
PHP to clean-up pasted Microsoft input

I allow clients to enter notes in a rich text editor, and have only recently upgraded to ckEditor 3x, which strips MS word classes, styles, and comments by default (when users paste into the editor object). So moving forward I'm all set.

I've recently had a need to clean up 5 years worth of notes some of which have MS word generated HTML embedded. I need to loop through this body of text and clean it.

I do not need to strip out all span tags, only those identified as written by Microsoft.

I've tried using HTMLCleaner, but it is not removing the MS generated HTML. http://word2cleanhtml.com does exactly what I want, however the developers are currently not offering the API for public use (as of July 9, 2012).

I've looked for such a class off and on for the last few weeks and am not having much luck. Have any of you found a useful class you'd like to share?


回答1:


http://htmlpurifier.org/

This will do what you want.



来源:https://stackoverflow.com/questions/11400260/remove-ms-word-html-using-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!