mechanize | 易学教程

Scraping a site that requires login username and password on two separate pages

阅读更多关于 Scraping a site that requires login username and password on two separate pages

问题 I'm trying to scrape information from my companies Intranet so that I can display information on our office wall board via dashing dashboard. I'm trying to work with the provided information from:This Site.The problem that I'm having other than being a noob is that in order to gain access to the information I want to scrape, I need to login to our Intranet providing my username on one page then submitting to another so that I can provide my password. Once I'm logged in, I can then link and

How to send JSON form data with Mechanize or Faraday in Ruby

阅读更多关于 How to send JSON form data with Mechanize or Faraday in Ruby

问题 I want to retrieve data from a website that uses JSON data to set custom search parameters which seem to be requested via AJAX. The data transmitted shows up under XHR->Request Payload in Firebug: {"filters": [{"action": "post", "filterName": "Hersteller", "ids": [269], "settingName": "Hersteller", "settingValue": "ValueA"}, {"action": "delete", "filterName": "Modelle", "settingName": "Modelle", "settingValue": ""}]} The site doesn't transmit any POST parameters but only this JSON encoded

Python - Mechanize sessions are not regonized

阅读更多关于 Python - Mechanize sessions are not regonized

I'm behind a proxy in a company. And in order to access some intra-sites, I have to login first to a specific site. The thing is if I login to this specific site from IE or FF, then I can access the intra-sites not necessarily from the same browser where I logged in. For ex: I logged in from FF, then I can access the intra-sites from IE and vice versa (and I get ping reply from the intra-sites). But, when I login using mechanize and while keeping the session alive (by using sleep ), I can not access any of the intra-sites (and I get ping timeout) Why are mechanize session not recognized?

Verifying br.submit() using Python's Mechanize module

阅读更多关于 Verifying br.submit() using Python's Mechanize module

Just trying to login to a website using mechanize. When I print "br.form", I can see my credentials entered into my form. But I do not know how to actually submit the form properly. I use "br.submit()" and attempt to verify it has proceeded to the next page by printing the br.title(), but the title appearing is for the login screen, and not the post-login screen. import mechanize from time import sleep def reportDownload(): # Prompt for login credentials print("We require your credentials.") Username = raw_input("Please enter your username. ") Password = raw_input("Please input your password.

How to click link in Mechanize and Nokogiri?

阅读更多关于 How to click link in Mechanize and Nokogiri?

问题 I'm using Mechanize to scrape Google Wallet for Order data. I am capturing all the data from the first page, however, I need to automatically link to subsequent pages to get more info. The #purchaseOrderPager-pagerNextButton will move to the next page so I can pick up more records to capture. The element looks like this. I need to click on it to keep going. <a id="purchaseOrderPager-pagerNextButton" class="kd-button small right" href="purchaseorderlist?startTime=0&... ;currentPageStart=1

SSL errors with Mechanize

阅读更多关于 SSL errors with Mechanize

I got those commands on irb require 'mechanize' agent = Mechanize.new agent.get('https://monabo.lemonde.fr/customer/account/forgotpassword/') I got this error: OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=unknown state: sslv3 alert handshake failure I tried on mac, and it works I don't have this error. However, it doesn't work on my computer (running Linux Mint 17). What I tried: Exporting this variable: export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt Setting this variable: agent.agent.http.ca_file = '/etc/ssl/certs/ca-certificates.crt' Setting this: OpenSSL::SSL:

python mechanize handle two parameters with same name

阅读更多关于 python mechanize handle two parameters with same name

I'm logging into a page where they oddly have a form input called login_email and two form inputs called login_password . I need to set the value of both but the straightforward call form['login_password'] throws an error: File "/Library/Python/2.7/site-packages/mechanize/_form.py", line 3101, in find_control return self._find_control(name, type, kind, id, label, predicate, nr) File "/Library/Python/2.7/site-packages/mechanize/_form.py", line 3183, in _find_control raise AmbiguityError("more than one control matching "+description) mechanize._form.AmbiguityError: more than one control matching

HTTP Error 999: Request denied

阅读更多关于 HTTP Error 999: Request denied

问题 I am trying to scrape some web pages from LinkedIn using BeautifulSoup and I keep getting error "HTTP Error 999: Request denied". Is there a way around to avoid this error. If you look at my code, I have tried Mechanize and URLLIB2 and both are giving me the same error. from __future__ import unicode_literals from bs4 import BeautifulSoup import urllib2 import csv import os import re import requests import pandas as pd import urlparse import urllib import urllib2 from BeautifulSoup import

Nokogiri Error: undefined method `radiobutton_with' - Why?

阅读更多关于 Nokogiri Error: undefined method `radiobutton_with' - Why?

I try to access a form using mechanize (Ruby). On my form I have a gorup of Radiobuttons. So I want to check one of them. I wrote: target_form = (page/:form).find{ |elem| elem['id'] == 'formid'} target_form.radiobutton_with(:name => "radiobuttonname")[2].check In this line I want to check the radiobutton with the value of 2. But in this line, I get an error: : undefined method `radiobutton_with' for #<Nokogiri::XML::Element:0x9b86ea> (NoMethodError) The problem occured because using a Mechanize page as a Nokogiri document (by calling the / method, or search , or xpath , etc.) returns Nokogiri

mechanize open Url python

阅读更多关于 mechanize open Url python

I am trying to open a URL using mechanize in python. The code executes with no errors, but nothing actually happens. What am I missing? Also, is there a way to set the browser? This is python 2.7. import mechanize url='http://www.google.com/' op = mechanize.Browser() # use mecahnize's browser op.set_handle_robots(False) #tell the webpage you're not a robot op.open(url) mechanize doesn't use real browsers - it is a tool for programmatic web-browsing. For example, print out the page title after opening the url: >>> import mechanize >>> url='http://www.google.com/' >>> op = mechanize.Browser() >>