Why does json.dumps escape non-ascii characters with “\uxxxx”

后端 未结 3 1119
眼角桃花
眼角桃花 2020-12-10 21:37

In Python 2, the function json.dumps() will ensure that all non-ascii characters are escaped as \\uxxxx.

Python 2 Json

But isn\'t

3条回答
  •  余生分开走
    2020-12-10 21:56

    The \u in "\u00f8" isn't actually an escape sequence like \x. The \u is a literal r'\u'. But such byte strings can easily be converted to Unicode.

    Demo:

    s = "\u00f8"
    u = s.decode('unicode-escape')
    print repr(s), len(s), repr(u), len(u)
    
    s = "\u2122"
    u = s.decode('unicode-escape')
    print repr(s), len(s), repr(u), len(u)
    

    output

    '\\u00f8' 6 u'\xf8' 1
    '\\u2122' 6 u'\u2122' 1
    

    As J.F.Sebastian mentions in the comments, inside a Unicode string \u00f8 is a true escape code, i.e., in a Python 3 string or in a Python 2 u"\u00f8" string. Also take heed of his other remarks!

提交回复
热议问题