问题
I'd like to be able to take an existing HTML snippet and convert it to markdown. I've tried pandoc for this purpose:
pandoc test.html -o test.md
where test.html looked like this:
Hello
<!-- more -->
and some more text
<h2>some heading</h2>
The result was this:
Hello and some more text
some heading
------------
Thus, it not only converts tags that have a direct meaning in markdown. It also removes tags that I would like to retain as HTML (e.g., HTML comments, iframe
tags, and so on).
- How can I convert HTML to markdown in a way that any tags that don't have an equivalent in markdown are retained as raw HTML?
- More generally how can I have control over how the HTML to markdown conversion is done?
In particular, I'd be interested in command-line program options. For example, perhaps there are options that can be supplied to pandoc.
回答1:
After a bit more searching, I read about the --parse-raw
option in a thread on table parsing.
Adding the --parse-raw
option seemed to not strip the non-markdown equivalent HTML tags.
pandoc test.html -o test.md --parse-raw
来源:https://stackoverflow.com/questions/16248986/how-to-convert-html-to-markdown-while-retaining-non-markdown-html-tags