Requesting URLs with base64 data encoded

落花浮王杯 提交于 2019-12-24 19:18:53

问题


I'm trying to request a URL with data encoded in base64 on it, like so:

http://www.somepage.com/es_e/bla_bla#eyJkYXRhIjp7ImNvdW50cnlJZCI6IkVTIiwicmVnaW9uSWQiOiI5MjAiLCJkdXJhdGlvbiI6NywibWluUGVyc29ucyI6MX0sImNvbmZpZyI6eyJwYWdlIjoiMCJ9fQ==

What I do, is build a JSON object, encode it into base64, and append it to a url like this:

new_data = {"data": {"countryId": "ES", "regionId": "920", "duration": 7, "minPersons": 1}, "config": {"page": 2}}
json_data = json.dumps(new_data)
new_url = "http://www.somepage.com/es_es/bla_bla#" + base64.b64encode(json_data)
yield scrapy.Request(url=new_url, callback=self.parse)

The problem is that Scrapy crawls only this part of the URL http://www.somepage.com/es_es/bla_bla without the data encoded and appended to it...however, if I paste the new_url into the browser, it shows me the result I want with the data encoded!

Don't know what's happening...Can anyone give me a hand?


回答1:


After been searching a lot, I read that this kind of URLs, the one with a # at the end (i.e. my URL http://www.somepage.com/es_e/bla_bla#eyJkYXRhIjp7ImNvdW50cnlJZCI6IkVTIiwicmVnaW9uSWQiOiI5MjAiLCJkdXJhdGlvbiI6NywibWluUGVyc29ucyI6MX0sImNvbmZpZyI6eyJwYWdlIjoiMCJ9fQ==) are called Fragment URLs and basically they indicate a location within a resource, like an anchor (you can read it here).

And then, thanks to this post I learned that those contents need to be loaded by the page, so the website itself makes requests to get that data (Outgoing Requests), so what I did was to search for those Outgoing Requests using Firefox Developer Edition (you can use any other system that shows you these requests, like Tamper Data), and build the URL that gives me the HTML content I was looking for.

# The base64 data encoded as a JSON is appended after the 'searchRequest=' instead of using the '#' element, and voilà!
"http://www.somewebsite.es/?controller=ajaxresults&action=getresults&searchRequest=eyJkYXRhIjp7ImNvdW50cnlJZCI6IkVTIiwicmVnaW9uSWQiOiI5MjAiLCJkdXJhdGlvbiI6N30sImNvbmZpZyI6eyJwYWdlIjoiMCJ9fQ=="

I could also achieve this by using the Selium library, as you can see in this other post, but isn't the best practice...



来源:https://stackoverflow.com/questions/46025120/requesting-urls-with-base64-data-encoded

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!