问题
I have been trying to parse xml and html page by using lxml and requests package in python. I using the following code for this purpose:
in python:
import requests
import lxml.etree
url = ""
req = requests.get(url)
tree = html.fromstring(req.content)
root = tree.xpath('')
for item in root:
print(item.text)
This code works fine but for some web pages can't show their contents properly and need to set encoding utf-8 but i don't know how i can add set encoding in this code
回答1:
requests
automatically decodes content from the server.
Important to understand:
r.content
- contains not yet decoded response content
r.encoding
- contains information about response content encoding
r.text
- according to the official doc it is already decoded version of r.content
Following the unicode standard, I get used to r.text
but you still can decode your content manually using
r.content.decode(r.encoding)
Hope it helps.
来源:https://stackoverflow.com/questions/40447117/parsing-xml-and-html-page-with-lxml-and-requests-package-in-python