Converting byte string in unicode string

前端 未结 2 814
长发绾君心
长发绾君心 2020-12-13 13:00

I have a code such that:

a = \"\\u0432\"
b = u\"\\u0432\"
c = b\"\\u0432\"
d = c.decode(\'utf8\')

print(type(a), a)
print(type(b), b)
print(type(c), c)
prin         


        
相关标签:
2条回答
  • 2020-12-13 13:54

    In strings (or Unicode objects in Python 2), \u has a special meaning, namely saying, "here comes a Unicode character specified by it's Unicode ID". Hence u"\u0432" will result in the character в.

    The b'' prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the \u code has no special meaning. Hence, b"\u0432" is just the sequence of the bytes \,u,0,4,3 and 2.

    Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character.

    You can convert this specification using the unicode escape encoder.

    >>> c.decode('unicode_escape')
    'в'
    
    0 讨论(0)
  • 2020-12-13 13:59

    Loved Lennart's answer. It put me on the right track for solving the particular problem I had faced. What I added was the ability to produce html-compatible code for \u???? specifications in strings. Basically, only one line was needed:

    results = results.replace('\\u','&#x')
    

    This all came about from a need to convert JSON results to something that displays well in a browser. Here is some test code that is integrated with a cloud application:

    # References:
    # http://stackoverflow.com/questions/9746303/how-do-i-send-a-post-request-as-a-json
    # https://docs.python.org/3/library/http.client.html
    # http://docs.python-requests.org/en/v0.10.7/user/quickstart/#custom-headers
    # http://stackoverflow.com/questions/606191/convert-bytes-to-a-python-string
    # http://www.w3schools.com/charsets/ref_utf_punctuation.asp
    # http://stackoverflow.com/questions/13837848/converting-byte-string-in-unicode-string
    
    import urllib.request
    import json
    
    body = [ { "query": "co-development and language.name:English", "page": 1, "pageSize": 100 } ]
    myurl = "https://core.ac.uk:443/api-v2/articles/search?metadata=true&fulltext=false&citations=false&similar=false&duplicate=false&urls=true&extractedUrls=false&faithfulMetadata=false&apiKey=SZYoqzk0Vx5QiEATgBPw1b842uypeXUv"
    req = urllib.request.Request(myurl)
    req.add_header('Content-Type', 'application/json; charset=utf-8')
    jsondata = json.dumps(body)
    jsondatabytes = jsondata.encode('utf-8') # needs to be bytes
    req.add_header('Content-Length', len(jsondatabytes))
    print ('\n', jsondatabytes, '\n')
    response = urllib.request.urlopen(req, jsondatabytes)
    results = response.read()
    results = results.decode('utf-8')
    results = results.replace('\\u','&#x') # produces html hex version of \u???? unicode characters
    print(results)
    
    0 讨论(0)
提交回复
热议问题