Latex command substitution using regexp in python

房东的猫 提交于 2019-12-11 15:06:23

问题


I wrote a very ugly script in order to parse some rows of latex in python and doing string substitution. I'm here because I'm want to write something to be proud of, and learn :P

More specifically, I'd like to change:

  • \ket{(.*)} into |(.*)\rangle
  • \bra{(.*)} into \langle(*)|

To this end, I wrote a very very ugly script. The intended use is to do a thing like this:

cat file.tex | python script.py > new_file.tex

So what I did is the following. It's working, but is not nice at all and I'm wondering if you could give me a suggestion, even a link to the right command to use is ok. Note that I do recursion because when I have found the first "\ket{" i know that I want to replace the first occuring "}" (i.e. I'm sure there are no other subcommands within "\ket{"). But again, it's not the right way of parsing latex.

def recursion_ket(string_input, string_output=""):
    match = re.search("\ket{", string_input)
    if not match:
        return string_input
    else:
        string_output = re.sub(r"\\ket{", '|', string_input, 1)
        string_output_second =re.sub(r"}", "\rangle", stringa_output.split('|', 1)[1],  1)
        string_output = string_output.split('|', 1)[0]+string_output_second
        string_output=recursion_ket(string_output, string_output)
    return string_output

if __name__ == '__main__':
    with open(sys.argv[1]) as f:
        content=f.readlines()
        new=[]
        for line in content:
            new.append(ricorsione_ket(line))
        z=open(sys.argv[2], 'w')
        for i in new:
            z.write(i.replace("\r", '\\r').replace("\b", '\\b'))
            z.write("")

Which I know is very ugly. And it's definitely not the right way of doing it. Probably it's because I come from perl, and I'm not used to python regexp.

  • First problem: is it possible to use regexp to substitute just the "border" of a matching string, and leave the inside as it is? I want to leave the content of \command{xxx} as it is.

  • Second problem: the \r. Apparently, when I try to print on the terminal or in a file each string, I need to make sure \r is not interpreted as carriage return. I have tried to use the automatic escape, but it's not what I need. It escapes the \n with another \ and this is not what I want.


回答1:


To answer your questions,

  • First problem: You can use (named) groups
  • Second problem: In Python3, you can use r"\btree" to deal with the backslash gracefully.

Using a latex parser like github.com/alvinwan/TexSoup, we can simplify the code a bit. I know OP has asked for regex, but if OP is tool-agnostic, a parser would be more robust.

Nice Function

We can abstract this into a replace function

def replaceTex(soup, command, replacement):
    for node in soup.find_all(command):
        node.replace(replacement.format(args=node.args))

Then, use this replaceTex function in the following way

>>> soup = TexSoup(r"\section{hello} text \bra{(.)} haha \ket{(.)}lol")
>>> replaceTex('bra', r"|{args[0]}\rangle")
>>> replaceTex('ket', r"\langle{args[0]}|")
>>> soup
\section{hello} text \langle(.)| haha |(.)\ranglelol

Demo

Here's a self-contained demonstration, based on TexSoup:

>>> import TexSoup
>>> soup = TexSoup(r"\section{hello} text \bra{(.)} haha \ket{(.)}lol")
>>> soup
\section{hello} text \bra{(.)} haha \ket{(.)}lol
>>> soup.ket.replace(r"|{args[0]}\rangle".format(args=soup.ket.args))
>>> soup.bra.replace(r"\langle{args[0]}|".format(args=soup.bra.args))
>>> soup
\section{hello} text \langle(.)| haha |(.)\ranglelol


来源:https://stackoverflow.com/questions/47250967/latex-command-substitution-using-regexp-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!