Quickly Convert (.rtf|.doc) Files to Markdown Syntax with PHP

后端 未结 7 1818
轮回少年
轮回少年 2021-01-29 18:50

I\'ve been manually converting articles into Markdown syntax for a few days now, and it\'s getting rather tedious. Some of these are 3 or 4 pages, italics and other emphasized t

7条回答
  •  無奈伤痛
    2021-01-29 19:28

    We had the same problem of having to convert Word documents to markdown. Some were more complicated and (very) large documents, with math equations and images and such. So I made this script which converts using a number of different tools: https://github.com/Versal/word2markdown

    Because it uses a chain of several tools it is a bit more error-prone, but it can be a good starting point if you have more complicated documents. Hope it can be helpful! :)

    Update: It currently only works on Mac OS X, and you need to have some requirements installed (Word, Pandoc, HTML Tidy, git, node/npm). For it to work properly, you also need to open an empty Word document, and do: File->Save As Webpage->Compatibility->Encoding->UTF-8. Then this encoding is saved as default. See the README for more details on how to set up.

    Then run this in the console:

    $ git clone git@github.com:Versal/word2markdown.git
    $ cd word2markdown
    $ npm install
    (copy over the Word files, for example, "document.docx")
    $ ./doc-to-md.sh document.docx document_files > document.md
    

    Then you can find the Markdown in document.md and images in the directory document_files.

    It's perhaps a bit complicated now, so I would welcome any contributions that make this easier or make this work on other operating systems! :)

提交回复
热议问题