问题
I wrote a very ugly script in order to parse some rows of latex in python and doing string substitution. I'm here because I'm want to write something to be proud of, and learn :P
More specifically, I'd like to change:
\ket{(.*)}into|(.*)\rangle\bra{(.*)}into\langle(*)|
To this end, I wrote a very very ugly script. The intended use is to do a thing like this:
cat file.tex | python script.py > new_file.tex
So what I did is the following. It's working, but is not nice at all and I'm wondering if you could give me a suggestion, even a link to the right command to use is ok. Note that I do recursion because when I have found the first "\ket{" i know that I want to replace the first occuring "}" (i.e. I'm sure there are no other subcommands within "\ket{"). But again, it's not the right way of parsing latex.
def recursion_ket(string_input, string_output=""):
match = re.search("\ket{", string_input)
if not match:
return string_input
else:
string_output = re.sub(r"\\ket{", '|', string_input, 1)
string_output_second =re.sub(r"}", "\rangle", stringa_output.split('|', 1)[1], 1)
string_output = string_output.split('|', 1)[0]+string_output_second
string_output=recursion_ket(string_output, string_output)
return string_output
if __name__ == '__main__':
with open(sys.argv[1]) as f:
content=f.readlines()
new=[]
for line in content:
new.append(ricorsione_ket(line))
z=open(sys.argv[2], 'w')
for i in new:
z.write(i.replace("\r", '\\r').replace("\b", '\\b'))
z.write("")
Which I know is very ugly. And it's definitely not the right way of doing it. Probably it's because I come from perl, and I'm not used to python regexp.
First problem: is it possible to use regexp to substitute just the "border" of a matching string, and leave the inside as it is? I want to leave the content of \command{xxx} as it is.
Second problem: the \r. Apparently, when I try to print on the terminal or in a file each string, I need to make sure \r is not interpreted as carriage return. I have tried to use the automatic escape, but it's not what I need. It escapes the \n with another \ and this is not what I want.
回答1:
To answer your questions,
- First problem: You can use (named) groups
- Second problem: In Python3, you can use r"\btree" to deal with the backslash gracefully.
Using a latex parser like github.com/alvinwan/TexSoup, we can simplify the code a bit. I know OP has asked for regex, but if OP is tool-agnostic, a parser would be more robust.
Nice Function
We can abstract this into a replace function
def replaceTex(soup, command, replacement):
for node in soup.find_all(command):
node.replace(replacement.format(args=node.args))
Then, use this replaceTex function in the following way
>>> soup = TexSoup(r"\section{hello} text \bra{(.)} haha \ket{(.)}lol")
>>> replaceTex('bra', r"|{args[0]}\rangle")
>>> replaceTex('ket', r"\langle{args[0]}|")
>>> soup
\section{hello} text \langle(.)| haha |(.)\ranglelol
Demo
Here's a self-contained demonstration, based on TexSoup:
>>> import TexSoup
>>> soup = TexSoup(r"\section{hello} text \bra{(.)} haha \ket{(.)}lol")
>>> soup
\section{hello} text \bra{(.)} haha \ket{(.)}lol
>>> soup.ket.replace(r"|{args[0]}\rangle".format(args=soup.ket.args))
>>> soup.bra.replace(r"\langle{args[0]}|".format(args=soup.bra.args))
>>> soup
\section{hello} text \langle(.)| haha |(.)\ranglelol
来源:https://stackoverflow.com/questions/47250967/latex-command-substitution-using-regexp-in-python