how to clean up microsoft html doc?

梦想与她 提交于 2019-12-21 04:29:14

问题


I have quite big document in html format that generated from Microsoft Word. It is soooo messy and full of bloated things (like unknow tag, unknow namespace etc and other bloated things)

is there any way to convert it into plain html sytax ?


回答1:


Try HTML Tidy. I hear it works quite well on HTML generated by MS Word (definitely at least up to Word 2000, but probably on more recent versions too).




回答2:


This isn't really a programming question, but (at least recent versions of) Word can save to "Web Page, Filtered", which removes Office-specific tags and properties and only leaves the tags necessary for the document to be rendered in a web browser. So, if you have Word, you could try using it to open the HTML document and save it in that format.




回答3:


You're probably looking for HTML Tidy, which has adapters in pretty much every language out there. It has options to clean up Microsoft Word HTML output (and many other features).




回答4:


try Cleanup HTML on-line tool to clean up word HTML



来源:https://stackoverflow.com/questions/1054468/how-to-clean-up-microsoft-html-doc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!