PHP to clean-up pasted Microsoft input

前端未结

关注

 4  1929

I have a site where users can post stuff (as in forums, comments, etc) using a customised implementation of TinyMCE. A lot of them like to copy & paste from Word, which

相关标签:

4条回答

不思量自难忘°

2020-11-29 12:00

The website http://word2cleanhtml.com/ does a good job on converting from Word. I'm using it in PHP by scrapping, to process some legacy HTML, and until now it's working pretty fine (the result is very clean ,  code). Of course, being an external service it's not good to use it in online processing like your case.

If you try it and it brings many 400 errors, try filtering the HTML with Tidy first.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2020-11-29 12:02
In my case, this worked just fine:
```
$text = strip_tags($text, '<a>');
```
Rather than trying to pull out stuff you don't want such as embedded word xml, you can just specify you're allowed tags.
0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2020-11-29 12:11
In my case, there was a pattern. The unwanted part always started with
```

```
So my solution was to cut out everything before and after this block:
```
$array = explode("<!-", $string, 2);
$begin = $array[0];
$end=substr(strrchr($string,'[endif]-->'),10);
echo $begin.$end;
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2020-11-29 12:16

HTML Purifier will create standards compliant markup and filter out many possible attacks (such as XSS).

For faster cleanups that don't require XSS filtering, I use the PECL extension Tidy which is a binding for the Tidy HTML utility.

If those don't help you, I suggest you switch to FCKEditor which has this feature built-in.

0 讨论(0)
发布评论:

提交评论
- 加载中...