How to convert escaped characters in Python?

我的未来我决定 提交于 2019-11-28 00:13:04
>>> escaped_str = 'One \\\'example\\\''
>>> print escaped_str.encode('string_escape')
One \\\'example\\\'
>>> print escaped_str.decode('string_escape')
One 'example'

Several similar codecs are available, such as rot13 and hex.

The above is Python 2.x, but – since you said (below, in a comment) that you're using Python 3.x – while it's circumlocutious to decode a Unicode string object, it's still possible. The codec has been renamed to "unicode_escape" too:

Python 3.3a0 (default:b6aafb20e5f5, Jul 29 2011, 05:34:11) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> escaped_str = "One \\\'example\\\'"
>>> import codecs
>>> print(codecs.getdecoder("unicode_escape")(escaped_str)[0])
One 'example'

I assume the question is really:

I have a string that is formatted as if it were a part of Python source code. How can I safely interpret it so that \n within the string is transformed into a newline, quotation marks are expected on either end, etc. ?

Try ast.literal_eval.

>>> import ast
>>> print ast.literal_eval(raw_input())
"hi, mom.\n This is a \"weird\" string, isn't it?"
hi, mom.
 This is a "weird" string, isn't it?

For comparison, going the other way:

>>> print repr(raw_input())
"hi, mom.\n This is a \"weird\" string, isn't it?"
'"hi, mom.\\n This is a \\"weird\\" string, isn\'t it?"'

Unpaired backslashes are just artifacts of the representation and not actually stored internally. You could cause errors if trying to do this manually.

If your only interest is removing a backslash not preceded by an odd amount of backslashes, you could try a while loop:

escaped_str = 'One \\\'example\\\''
chars = []
i = 0
while i < len(escaped_str):
    if i == '\\':
        chars.append(escaped_str[i+1])
        i += 2
    else:
        chars.append(escaped_str[i])
        i += 1
fixed_str = ''.join(chars)
print fixed_str

Examine your variables afterwards and you'll see why what you're trying to do doesn't make sense.

...But on a side note I'm almost 100% certain "the same way Python's lexical parser" does it is not using a parser, so to speak. A parser is for grammars, which describe the way you fit words together.

You're thinking of lexical content verification maybe, which is often specified using regular expressions. Parsers are an altogether more challenging and powerful beast, and not something you want to mess around with for the purposes of linear string manipulation.

SingleNegationElimination already mentioned this, but here is an example:

In Python 3:

>>>escaped_str = 'One \\\'example\\\''
>>>print(escaped_str.encode('ascii', 'ignore').decode('unicode_escape'))
One 'example'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!