I would like to convert doc/docx documents to semantic HTML.
Some wishes/requirements:
Semantic HTML such that headers in the document are
I wrote a utility which implements the requirements you listed, excluding images, graphs and maths formulas. It's beta quality (i.e., it works on my machine). I published it at http://www.modeltext.com/word