parsing XML file gets UnicodeEncodeError (ElementTree) / ValueError (lxml)

后端 未结 3 1479
情深已故
情深已故 2020-12-03 16:14

I send a GET request to the CareerBuilder API :

import requests

url = \"http://api.careerbuilder.com/v1/jobsearch\"
payload = {\'DeveloperKey\': \'MY_DEVLOP         


        
3条回答
  •  南笙
    南笙 (楼主)
    2020-12-03 16:48

    You are using the decoded unicode value. Use r.raw raw response data instead:

    r = requests.get(url, params=payload, stream=True)
    r.raw.decode_content = True
    etree.parse(r.raw)
    

    which will read the data from the response directly; do note the stream=True option to .get().

    Setting the r.raw.decode_content = True flag ensures that the raw socket will give you the decompressed content even if the response is gzip or deflate compressed.

    You don't have to stream the response; for smaller XML documents it is fine to use the response.content attribute, which is the un-decoded response body:

    r = requests.get(url, params=payload)
    xml = etree.fromstring(r.content)
    

    XML parsers always expect bytes as input as the XML format itself dictates how the parser is to decode those bytes to Unicode text.

提交回复
热议问题