JSON URL sometimes returns a null response

夙愿已清 提交于 2019-12-23 05:23:59

问题


I'm scraping a website which loads product data from individual JSON files. I found the URLs to the JSONs by inspecting the network traffic.

The problem is this: when I follow the JSON URLs, most of the links will provide a JSON result. But the JSON URLs of products that have special characters in them, eg é, return a null response. Of course the data is shown on the browser but I can't seem to get the JSON response directly.

Any tips?

(I'm trying to find a similar website that acts in the same way so I can post it here for example)

EDIT:

Here is an example

Product A url: https://www.boozebud.com/p/hopnationbrewingco/thedamned

WORKS: A's JSON url: https://www.boozebud.com/a/producturl/p/hopnationbrewingco/thedamned

Product B url: https://www.boozebud.com/p/àbloc/superprestigenaturalblondebeer

RETURNS NULL: B's JSON url: https://www.boozebud.com/a/producturl/p/àbloc/superprestigenaturalblondebeer

(Related to my previous unanswered question: scrapy: dealing with special characters in url which might need to be revised in light of this question)


回答1:


It seems to me that the problem is the headers, it seems to be very sensitive to at least the Content-Type header, it seems it's used internally on the server to decode the incoming URL or something like that. Try downloading the request like this (this is what the internal js is doing)

yield Request('https://www.boozebud.com/a/producturl/p/%C3%A0bloc/superprestigenaturalblondebeer', 
              headers={"Content-Type": "application/json; charset=UTF-8"})


来源:https://stackoverflow.com/questions/47563095/json-url-sometimes-returns-a-null-response

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!