问题
I want to convert strings containing escaped characters to their normal form, the same way Python's lexical parser does:
>>> escaped_str = 'One \\\'example\\\''
>>> print(escaped_str)
One \'Example\'
>>> normal_str = normalize_str(escaped_str)
>>> print(normal_str)
One 'Example'
Of course the boring way will be to replace all known escaped characters one by one: http://docs.python.org/reference/lexical_analysis.html#string-literals
How would you implement normalize_str()
in the above code?
回答1:
>>> escaped_str = 'One \\\'example\\\'' >>> print escaped_str.encode('string_escape') One \\\'example\\\' >>> print escaped_str.decode('string_escape') One 'example'
Several similar codecs are available, such as rot13 and hex.
The above is Python 2.x, but – since you said (below, in a comment) that you're using Python 3.x – while it's circumlocutious to decode a Unicode string object, it's still possible. The codec has been renamed to "unicode_escape" too:
Python 3.3a0 (default:b6aafb20e5f5, Jul 29 2011, 05:34:11) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> escaped_str = "One \\\'example\\\'" >>> import codecs >>> print(codecs.getdecoder("unicode_escape")(escaped_str)[0]) One 'example'
回答2:
I assume the question is really:
I have a string that is formatted as if it were a part of Python source code. How can I safely interpret it so that
\n
within the string is transformed into a newline, quotation marks are expected on either end, etc. ?
Try ast.literal_eval
.
>>> import ast
>>> print ast.literal_eval(raw_input())
"hi, mom.\n This is a \"weird\" string, isn't it?"
hi, mom.
This is a "weird" string, isn't it?
For comparison, going the other way:
>>> print repr(raw_input())
"hi, mom.\n This is a \"weird\" string, isn't it?"
'"hi, mom.\\n This is a \\"weird\\" string, isn\'t it?"'
回答3:
SingleNegationElimination already mentioned this, but here is an example:
In Python 3:
>>>escaped_str = 'One \\\'example\\\''
>>>print(escaped_str.encode('ascii', 'ignore').decode('unicode_escape'))
One 'example'
回答4:
Unpaired backslashes are just artifacts of the representation and not actually stored internally. You could cause errors if trying to do this manually.
If your only interest is removing a backslash not preceded by an odd amount of backslashes, you could try a while loop:
escaped_str = 'One \\\'example\\\''
chars = []
i = 0
while i < len(escaped_str):
if i == '\\':
chars.append(escaped_str[i+1])
i += 2
else:
chars.append(escaped_str[i])
i += 1
fixed_str = ''.join(chars)
print fixed_str
Examine your variables afterwards and you'll see why what you're trying to do doesn't make sense.
...But on a side note I'm almost 100% certain "the same way Python's lexical parser" does it is not using a parser, so to speak. A parser is for grammars, which describe the way you fit words together.
You're thinking of lexical content verification maybe, which is often specified using regular expressions. Parsers are an altogether more challenging and powerful beast, and not something you want to mess around with for the purposes of linear string manipulation.
来源:https://stackoverflow.com/questions/6867588/how-to-convert-escaped-characters