I\'ve been manually converting articles into Markdown syntax for a few days now, and it\'s getting rather tedious. Some of these are 3 or 4 pages, italics and other emphasized t
We had the same problem of having to convert Word documents to markdown. Some were more complicated and (very) large documents, with math equations and images and such. So I made this script which converts using a number of different tools: https://github.com/Versal/word2markdown
Because it uses a chain of several tools it is a bit more error-prone, but it can be a good starting point if you have more complicated documents. Hope it can be helpful! :)
Update: It currently only works on Mac OS X, and you need to have some requirements installed (Word, Pandoc, HTML Tidy, git, node/npm). For it to work properly, you also need to open an empty Word document, and do: File->Save As Webpage->Compatibility->Encoding->UTF-8. Then this encoding is saved as default. See the README for more details on how to set up.
Then run this in the console:
$ git clone git@github.com:Versal/word2markdown.git
$ cd word2markdown
$ npm install
(copy over the Word files, for example, "document.docx")
$ ./doc-to-md.sh document.docx document_files > document.md
Then you can find the Markdown in document.md
and images in the directory document_files
.
It's perhaps a bit complicated now, so I would welcome any contributions that make this easier or make this work on other operating systems! :)