Batch conversion of docx to clean HTML

为君一笑 提交于 2019-11-28 19:35:13

This looks like just what you need: http://msdn.microsoft.com/en-us/library/ff628051(v=office.14).aspx

The author Eric White blogged about his experiences developing that tool. You can see that list of posts on his blog here: http://blogs.msdn.com/b/ericwhite/archive/2008/10/20/eric-white-s-blog-s-table-of-contents.aspx#Open_XML_to_XHtml

Since I'm a big fan of Aspose.Words, a commercial library to create/process Word documents, I would do something like:

  1. Open the Word document with Aspose.Words.
  2. Save the Word document as HTML.
  3. Use something like SgmlReader or HTML Agility Pack (or even Regular Expressions if it is suitable) to remove unwanted HTML tags/attributes.

Since you wrote you work at an university, I'm not sure whether commercial packages are an option, though.

Hi not sure what the rules are on promoting your own solutions, so do let me know if I am out of line.

I am a web developer who had the same issues, so I created my own tool: http://www.convertwordtohtml.com

We are also working on a new version that will have even better conversion quality and one click conversion eg you can right click on a word file and it will be directly converted to html and the code placed into the clipboard. The current version also supports command line access and the new version will have a server version to.

There is a free trial version downloadable from the site , and if you have any questions do contact me any time.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!