How to replace a double backslash with a single backslash in python?

江枫思渺然 提交于 2019-11-26 22:57:46

问题


I have a string. In that string are double backslashes. I want to replace the double backslashes with single backslashes, so that unicode char codes can be parsed correctly.

(Pdb) p fetched_page
'<p style="text-align:center;" align="center"><strong><span style="font-family:\'Times New Roman\', serif;font-size:115%;">Chapter 0<\\/span><\\/strong><\\/p>\n<p><span style="font-family:\'Times New Roman\', serif;font-size:115%;">Chapter 0 in \\u201cDreaming in Code\\u201d give a brief description of programming in its early years and how and why programmers are still struggling today...'

Inside of this string, you can see escaped unicode character codes, such as:

\\u201c

I want to turn this into:

\u201c

Attempt 1:

fetched_page.replace('\\\\', '\\')

but this doesn't work -- it searches for quadruple backslashes.

Attempt 2:

fetched_page.replace('\\', '\')

But this results in an end of line error.

Attempt 3:

fetched_page.decode('string_escape')

But this had no effect on the text. All the double backslashes remained as double backslashes.


回答1:


You can try codecs.escape_decode, this should decode the escape sequences.




回答2:


I'm not getting the behaviour you describe:

>>> x = "\\\\\\\\"
>>> print x
\\\\
>>> y = x.replace('\\\\', '\\')
>>> print y
\\

When you see '\\\\' in your output, you're seeing twice as many slashes as there are in the string because each on is escaped. The code you wrote should work fine. Trying printing out the actual values, instead of only looking at how the REPL displays them.




回答3:


To extend on Jeremy's answer, your problem is that '\' is an illegal string because \' escapes the quote mark, so your string never terminates.




回答4:


It may be slightly overkill, but...

>>> import re
>>> a = '\\u201c\\u3012'
>>> re.sub(r'\\u[0-9a-fA-F]{4}', lambda x:eval('"' + x.group() + '"'), a)
'“〒'

So yeah, the simplest solution would ms4py's answer, calling codecs.escape_decode on the string and taking the result (or the first element of the result if escape_decode returns a tuple as it seems to in Python 3). In Python 3 you'd want to use codecs.unicode_escape_decode when working with strings (as opposed to bytes objects), though.




回答5:


Python3:

>>> b'\\u201c'.decode('unicode_escape')
'“'

or

>>> '\\u201c'.encode().decode('unicode_escape')
'“'



回答6:


Just print it:

>>> a = '\\u201c'
>>> print a
\u201c


来源:https://stackoverflow.com/questions/6752485/how-to-replace-a-double-backslash-with-a-single-backslash-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!