Effects of passing headers in a requests?

家住魔仙堡 提交于 2020-08-20 07:06:20

问题


I want to know what difference it makes when you pass headers in requests.get i.e. the difference between requests.get(url, headers) and requests.get(url).

I have these two pieces of code:

from lxml import html
from lxml import etree
import requests
import re

url = "http://www.amazon.in/SanDisk-micro-USB-connector-OTG-enabled-Android/dp/B00RBGYGMO"

page = requests.get(url)
tree = html.fromstring(page.text)
XPATH_IMAGE_SOURCE = '//*[@id="main-image-container"]//img/@src'
image_source = tree.xpath(XPATH_IMAGE_SOURCE)
print 'type: ',type(image_source[0])
print image_source[0]

this whose out put is a url as you'd expect. But this:

from lxml import html
from lxml import etree
import requests
import re

url = "http://www.amazon.in/SanDisk-micro-USB-connector-OTG-enabled-Android/dp/B00RBGYGMO"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(url, headers=headers)

tree = html.fromstring(page.text)
XPATH_IMAGE_SOURCE = '//*[@id="main-image-container"]//img/@src'
image_source = tree.xpath(XPATH_IMAGE_SOURCE)
print 'type: ',type(image_source[0])
print image_source[0]

has an output that starts with data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAoHBwgHBgoIC I'm guessing this is the actual image without the rendering, just plain data. Any idea how I could keep it in url form? In what other ways does the presence of a header affect the response we get?

Thank You


回答1:


Save the first code's response to html file and open in your browser:

as you can see, you are banned by amazon without headers.

use this xpath:

XPATH_IMAGE_SOURCE = '//*[@id="main-image-container"]//img/@data-old-hires'

out:

type:  <class 'lxml.etree._ElementStringResult'>
http://ecx.images-amazon.com/images/I/617TjMIouyL._SL1274_.jpg

this is raw html data:

<img alt=".." src="&#10;data:image/webp;base64,UklGRuYIAABXRUJQVlA4INoIAACQQQCdASosAcsAPrFWpEqkIqQhIxN6gIgWCek6r4bUf/..." 
data-old-hires="http://ecx.images-amazon.com/images/I/617TjMIouyL._SL1274_.jpg"

the picture url is in data-old-hires attribute.



来源:https://stackoverflow.com/questions/41611281/effects-of-passing-headers-in-a-requests

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!