How to convert HTML with mathjax into latex using pandoc?

匿名 (未验证) 提交于 2019-12-03 02:16:02

问题:

I have some HTML documents with MathJax equations, and I want to convert them to latex, and then to pdf. I'd like to use pandoc.

However, pandoc replaces $ with \$ and it replaces \ in formulas with \textbackslash{}.

Is it possible to get pandoc to pass Mathjax formulas literally from html to latex?

回答1:

With the latest version of pandoc (1.12.2), you can do this:

pandoc -f html+tex_math_dollars+tex_math_single_backslash -t latex 

Much nicer! If you don't want to convert math delimited by \( and \), just do

pandoc -f html+tex_math_dollars -t latex 


回答2:

It's not an easy task. Here's a solution that should work, provided you only use $ and $$ as math delimiters, and assuming your document doesn't contain any other uses of $. (If you can't assume that, you can try adjusting the perl regex in what follows.)

Step 1: Install the Haskell Platform, if you don't have it already, and 'cabal install pandoc' to get the pandoc library. (If you installed pandoc with the binary installer, you only have the executable, not the Haskell library.)

Step 2: Now write a small Haskell script -- we'll call it fixmath.hs:

import Text.Pandoc  main = toJsonFilter fixmath  fixmath :: Block -> Block fixmath = bottomUp fixmathBlock . bottomUp fixmathInline  fixmathInline :: Inline -> Inline fixmathInline (RawInline "html" (' Block fixmathBlock (RawBlock "html" ('

Compile this:

ghc --make fixmath.hs 

This will give you an executable fixmath. Now, assuming your input file is input.html, the following command should convert it to latex with the math intact, putting the result in output.html:

cat input.html | \ perl -0pe 's/(\$\$?[^\$]+\$\$?)/\/gm' | \ pandoc -s --parse-raw -f html -t json | \ ./fixmath | \ pandoc -f json -t latex -s > output.tex 

The first part is a perl one-liner that puts your math bits in special HTML comments marked "MATH". The second part parses the HTML into a JSON representation of the Pandoc data structure corresponding to the document. Then fixmath transforms this structure, changing the special HTML comments into raw LaTeX blocks and inlines. (See Scripting with pandoc for an explanation.) Finally we convert from JSON back to LaTeX.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!