How to convert HTML to Markdown while retaining non-markdown HTML tags?

时光总嘲笑我的痴心妄想 提交于 2019-12-03 15:52:16

问题


I'd like to be able to take an existing HTML snippet and convert it to markdown. I've tried pandoc for this purpose:

pandoc test.html -o test.md

where test.html looked like this:

Hello

<!-- more -->

and some more text

<h2>some heading</h2>       

The result was this:

Hello and some more text

some heading
------------

Thus, it not only converts tags that have a direct meaning in markdown. It also removes tags that I would like to retain as HTML (e.g., HTML comments, iframe tags, and so on).

  • How can I convert HTML to markdown in a way that any tags that don't have an equivalent in markdown are retained as raw HTML?
  • More generally how can I have control over how the HTML to markdown conversion is done?

In particular, I'd be interested in command-line program options. For example, perhaps there are options that can be supplied to pandoc.


回答1:


After a bit more searching, I read about the --parse-raw option in a thread on table parsing.

Adding the --parse-raw option seemed to not strip the non-markdown equivalent HTML tags.

pandoc test.html -o test.md --parse-raw


来源:https://stackoverflow.com/questions/16248986/how-to-convert-html-to-markdown-while-retaining-non-markdown-html-tags

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!