I would like to convert doc/docx documents to semantic HTML.
Some wishes/requirements:
Semantic HTML such that headers in the document are
" headers in the document are "
I think this is impossible.
Because MS Word only write down the result, with different styles of
just like printed text on paper, the original info are not recorded.
Your other wishes could be approached. There're two commercial tools can do this (don't believe those free tools or online tools, they don't do the real work.)
1 Word Cleaner by Zapadoo
www.zapadoo.com
2 HTML Cleaner for Word by wonder Studio
www.htmlcleaner.com
I prefer the second one which released just last year. You can try them both.