Converting a Word document into usable HTML in PHP

前端 未结 5 1956
轻奢々
轻奢々 2020-12-17 03:41

I have a set of Word documents which I want to publish using a PHP tool I\'ve written. I copy and paste the Word documents into a text box and then save them into MySQL usin

5条回答
  •  忘掉有多难
    2020-12-17 04:16

    I think that all these answers miss one vital point. Windows itself uses a windows flavour of latin1, so if you paste some special characters in (like asymetrical quotes) into a form on a windows machine and that gets sent to a unix (or anything non-muckrosoft) box (be that to a database or whatever) some of the characters do not get matched to anything the unix system comprehends, hence the confused and garbled characters. What this means is that even if you have a UTF-8 database, and use htmlentities, some nasties are still going to get through because they are characters the OS doesn't recognise - they aren't even part of UTF-8 - the are microsoft-only inventions. I would love to know of a slick solution - what I do is manually blacklist the character codes of the microsoft-only chars I have encountered with an (also manual) list of UTF-8 characters, do a str_replace for all of these, and THEN you can do whatever you want with them - iconv, htmlentities, save straight into an utf8 database, it matters not anymore.

    My grasp on this all is a little shaky - check out http://www.cs.tut.fi/~jkorpela/www/windows-chars.html for an excellent explanation which I have mutilated into short form above. - If someone has a better solution (surely there is one out there!) of how to PHPify what this article explains... I would love to hear it!

提交回复
热议问题