mechanize

Scraping a site that requires login username and password on two separate pages

拥有回忆 提交于 2019-12-06 07:24:04
问题 I'm trying to scrape information from my companies Intranet so that I can display information on our office wall board via dashing dashboard. I'm trying to work with the provided information from:This Site.The problem that I'm having other than being a noob is that in order to gain access to the information I want to scrape, I need to login to our Intranet providing my username on one page then submitting to another so that I can provide my password. Once I'm logged in, I can then link and

How to send JSON form data with Mechanize or Faraday in Ruby

拈花ヽ惹草 提交于 2019-12-06 07:04:51
问题 I want to retrieve data from a website that uses JSON data to set custom search parameters which seem to be requested via AJAX. The data transmitted shows up under XHR->Request Payload in Firebug: {"filters": [{"action": "post", "filterName": "Hersteller", "ids": [269], "settingName": "Hersteller", "settingValue": "ValueA"}, {"action": "delete", "filterName": "Modelle", "settingName": "Modelle", "settingValue": ""}]} The site doesn't transmit any POST parameters but only this JSON encoded

Python - Mechanize sessions are not regonized

只谈情不闲聊 提交于 2019-12-06 06:13:25
I'm behind a proxy in a company. And in order to access some intra-sites, I have to login first to a specific site. The thing is if I login to this specific site from IE or FF, then I can access the intra-sites not necessarily from the same browser where I logged in. For ex: I logged in from FF, then I can access the intra-sites from IE and vice versa (and I get ping reply from the intra-sites). But, when I login using mechanize and while keeping the session alive (by using sleep ), I can not access any of the intra-sites (and I get ping timeout) Why are mechanize session not recognized?

Verifying br.submit() using Python's Mechanize module

爷,独闯天下 提交于 2019-12-06 05:28:59
Just trying to login to a website using mechanize. When I print "br.form", I can see my credentials entered into my form. But I do not know how to actually submit the form properly. I use "br.submit()" and attempt to verify it has proceeded to the next page by printing the br.title(), but the title appearing is for the login screen, and not the post-login screen. import mechanize from time import sleep def reportDownload(): # Prompt for login credentials print("We require your credentials.") Username = raw_input("Please enter your username. ") Password = raw_input("Please input your password.

How to click link in Mechanize and Nokogiri?

那年仲夏 提交于 2019-12-06 04:12:00
问题 I'm using Mechanize to scrape Google Wallet for Order data. I am capturing all the data from the first page, however, I need to automatically link to subsequent pages to get more info. The #purchaseOrderPager-pagerNextButton will move to the next page so I can pick up more records to capture. The element looks like this. I need to click on it to keep going. <a id="purchaseOrderPager-pagerNextButton" class="kd-button small right" href="purchaseorderlist?startTime=0&... ;currentPageStart=1

SSL errors with Mechanize

半腔热情 提交于 2019-12-06 03:29:13
I got those commands on irb require 'mechanize' agent = Mechanize.new agent.get('https://monabo.lemonde.fr/customer/account/forgotpassword/') I got this error: OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=unknown state: sslv3 alert handshake failure I tried on mac, and it works I don't have this error. However, it doesn't work on my computer (running Linux Mint 17). What I tried: Exporting this variable: export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt Setting this variable: agent.agent.http.ca_file = '/etc/ssl/certs/ca-certificates.crt' Setting this: OpenSSL::SSL:

python mechanize handle two parameters with same name

对着背影说爱祢 提交于 2019-12-06 02:41:41
I'm logging into a page where they oddly have a form input called login_email and two form inputs called login_password . I need to set the value of both but the straightforward call form['login_password'] throws an error: File "/Library/Python/2.7/site-packages/mechanize/_form.py", line 3101, in find_control return self._find_control(name, type, kind, id, label, predicate, nr) File "/Library/Python/2.7/site-packages/mechanize/_form.py", line 3183, in _find_control raise AmbiguityError("more than one control matching "+description) mechanize._form.AmbiguityError: more than one control matching

HTTP Error 999: Request denied

你。 提交于 2019-12-06 02:16:54
问题 I am trying to scrape some web pages from LinkedIn using BeautifulSoup and I keep getting error "HTTP Error 999: Request denied". Is there a way around to avoid this error. If you look at my code, I have tried Mechanize and URLLIB2 and both are giving me the same error. from __future__ import unicode_literals from bs4 import BeautifulSoup import urllib2 import csv import os import re import requests import pandas as pd import urlparse import urllib import urllib2 from BeautifulSoup import

Nokogiri Error: undefined method `radiobutton_with' - Why?

百般思念 提交于 2019-12-06 00:10:20
I try to access a form using mechanize (Ruby). On my form I have a gorup of Radiobuttons. So I want to check one of them. I wrote: target_form = (page/:form).find{ |elem| elem['id'] == 'formid'} target_form.radiobutton_with(:name => "radiobuttonname")[2].check In this line I want to check the radiobutton with the value of 2. But in this line, I get an error: : undefined method `radiobutton_with' for #<Nokogiri::XML::Element:0x9b86ea> (NoMethodError) The problem occured because using a Mechanize page as a Nokogiri document (by calling the / method, or search , or xpath , etc.) returns Nokogiri

mechanize open Url python

你。 提交于 2019-12-05 23:43:56
I am trying to open a URL using mechanize in python. The code executes with no errors, but nothing actually happens. What am I missing? Also, is there a way to set the browser? This is python 2.7. import mechanize url='http://www.google.com/' op = mechanize.Browser() # use mecahnize's browser op.set_handle_robots(False) #tell the webpage you're not a robot op.open(url) mechanize doesn't use real browsers - it is a tool for programmatic web-browsing. For example, print out the page title after opening the url: >>> import mechanize >>> url='http://www.google.com/' >>> op = mechanize.Browser() >>