Looking for Requests equivalent of Mechanize capabilities

吃可爱长大的小学妹 提交于 2019-12-04 09:49:02

requests does not fill the same role as Mechanize.

Mechanize loads the actual HTML form and parses this, letting you fill in values for the various elements in the form. When you then ask Mechanize to submit the form, it'll use all information in the form to produce a valid request to the server. This includes any form elements you didn't provide a new value for, using default values if present. This includes any hidden form elements not visible in your browser.

Use a project like robobrowser instead; it wraps requests as well as BeautifulSoup to load webpages, parse out the form elements, help you fill out those elements and submit them back again.

If you want to use just requests, you'll need to make sure you are posting all fields defined by the form. This means you need to look at the method attribute (defaults to GET), the action attribute (defaults to the current URL), and at all the input, select, textarea and button elements. The server may also be expecting additional information in the HTTP request, such as cookies or the Referer (sic) header.

The Mechanize information you printed indicates that it has parsed several more fields from the forms for which you did not provide values, for example. The form in question also contains a hidden input field named form_build_id for example, which the server may be relying on. Mechanize would also have captured any cookies sent with the original form request, and those cookies may also be required for the server to accept the request. robobrowser would take the same context into account.

What the OP wanted can be done using requests and BeautifulSoup like this:

import requests
import urlparse
from bs4 import BeautifulSoup
url = "https://www.euronext.com/en/data/download?ml=nyx_pd_stocks&cmd=default&formKey=nyx_pd_filter_values%3A18d1ee939a63d459d9a2a3b07b8837a7"
resp = requests.get(url, allow_redirects=True)

soup = BeautifulSoup(resp.text, "lxml")
forms = soup.findAll("form")
params = dict([(parm.get("name"), parm.get("value")) for parm in forms[1].findAll("input")])
params.update({'format':'2','date_format':'2'})
formAction = forms[1].get("action")
# Make the relative URL absolute.
formAction = urlparse.urlunsplit(urlparse.urlsplit(url)[0:2] + urlparse.urlsplit(formAction)[2:])

resp = requests.post(formAction, data=params)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!