Why pandoc keeps span and div tags when converting html to markdown?

问题

I'm a pandoc newbie, so I must be missing something obvious. I'm trying to convert MS Word generated HTML file to markdown. Here is a test html:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
</head>
<body>
  <div class="Section1">
    <p class="Question"><span style="FONT-SIZE: 10pt">Today</span> <span style=
    "FONT-SIZE: 10pt">is</span> <span lang="HR" style=
    "FONT-SIZE: 10pt; mso-ansi-language: HR">a</span><span style=
    "FONT-SIZE: 10pt">nice</span> <span style="FONT-SIZE: 10pt">day</span> 
    </p>
  </div>
</body>
</html>

and I try to convert it with:

pandoc -f html -t markdown test.html -o test.md

I was expecting "Today is a nice day", but got:

<div class="Section1">

<span style="FONT-SIZE: 10pt">Today</span> <span
style="FONT-SIZE: 10pt">is</span> <span lang="HR"
style="FONT-SIZE: 10pt; mso-ansi-language: HR">a</span><span
style="FONT-SIZE: 10pt">nice</span> <span
style="FONT-SIZE: 10pt">day</span>

</div>

Why was the div kept? Why were the spans kept?

回答1:

You need to turn off some extensions. Either on the HTML input side:

$ pandoc -f html-native_divs-native_spans -t markdown test.html -o test.md

Or on the markdown output side:

$ pandoc -f html -t markdown-raw_html-native_divs-native_spans-fenced_divs-bracketed_spans test.html -o test.md

来源：https://stackoverflow.com/questions/35807092/why-pandoc-keeps-span-and-div-tags-when-converting-html-to-markdown

标签

html

Markdown

pandoc

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!