mechanize | 易学教程

Catching timeout errors with ruby mechanize

阅读更多关于 Catching timeout errors with ruby mechanize

问题 I have a mechanize function to log me out of a site but on very rare occasions it times me out. The function involves going to a specific page, and then clicking on a logout button. On the occasional that mechanize suffers a timeout when either going to the logout page or clicking the logout button the code crashes. So I put in a small rescue and it seems to be working as seen below the first piece of code. def logmeout(agent) page = agent.get('http://www.example.com/') agent.click(page.link

Python re - escape coincidental parentheses in regex pattern

阅读更多关于 Python re - escape coincidental parentheses in regex pattern

I am having trouble with the regex in the following code: import mechanize import re br = mechanize.Browser() br.set_handle_robots(False) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] response = br.open("http://www.gfsc.gg/The-Commission/Pages/Regulated-Entities.aspx?auto_click=1") html = response.read() br.select_form(nr=0) #print br.form br.set_all_readonly(False) next = re.search(r"""<a href="javascript:__doPostBack('(.*?)','(.*?)')">""",html) if next: print 'group(1):', next.group(1) print 'group(2)

Python authenticate and launch private page using webbrowser, urllib and CookieJar

阅读更多关于 Python authenticate and launch private page using webbrowser, urllib and CookieJar

I want to login with cookiejar and and launch not the login page but a page that can only be seen after authenticated. I know mechanize does that but besides not working for me now, I rather do this without it. Now I have, import urllib, urllib2, cookielib, webbrowser from cookielib import CookieJar username = 'my_username' password = 'my_password' url = 'my_login_page' cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) login_data = urllib.urlencode({'my_username' : username, 'my_password' : password}) opener.open(url, login_data) page_to_launch = 'my

How do I grab a thumbnail screenshot of many websites?

阅读更多关于 How do I grab a thumbnail screenshot of many websites?

I have a list of 2500 websites and need to grab a thumbnail screenshot of them. How do I do that? I could try to parse the sites either with Perl or Python, Mechanize would be a good thing. But I am not so experienced with Perl. Here is Perl solution: use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); $mech->get('http://google.com'); my $png = $mech->content_as_png(); From the docs: Returns the given tab or the current page rendered as PNG image. All parameters are optional. $tab defaults to the current tab. If the coordinates are given, that rectangle will be cut out. The

mechanize._mechanize.FormNotFoundError: no form matching name 'q'

阅读更多关于 mechanize._mechanize.FormNotFoundError: no form matching name 'q'

Can anyone help me get this form selection correct? Trying to get a crawl of google, I get the error: mechanize._mechanize.FormNotFoundError: no form matching name 'q' Unusual, since I have seen several other tutorials using it, and: p.s. I don't plan to SLAM google with requests, just hope to use an automatic selector to take the effort out of finding academic citation pdfs from time to time. <f GET http://www.google.com.tw/search application/x-www-form-urlencoded <HiddenControl(ie=Big5) (readonly)> <HiddenControl(hl=zh-TW) (readonly)> <HiddenControl(source=hp) (readonly)> <TextControl(q=)>

How to install mechanize for Python 2.7?

阅读更多关于 How to install mechanize for Python 2.7?

问题 I saved mechanize in my Python 2.7 directory. But when I type import mechanize into the Python shell, I get an error message that reads: Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> import mechanize ImportError: No module named mechanize 回答1: using pip: pip install mechanize or download the mechanize distribution archive, open it, and run: python setup.py install 回答2: Try this on Debian/Ubuntu: sudo apt-get install python-mechanize 回答3: You need to follow the

Mechanize and Google App Engine

阅读更多关于 Mechanize and Google App Engine

问题 Has someone managed to use mechanize with Google App Engine application? 回答1: I have solved this problem, please see: Python Mechanize + GAEpython code 回答2: I found that someone created this project: gaemechanize. But no code at the time of writing. 来源： https://stackoverflow.com/questions/1389893/mechanize-and-google-app-engine

How to properly use mechanize to scrape AJAX sites

阅读更多关于 How to properly use mechanize to scrape AJAX sites

So I am fairly new to web scraping. There is this site that has a table on it, the values of the table are controlled by Javascript. The values will determine the address of future values that my browser is told to request from the Javascript. These new pages have JSON responses that the script updates the table with in my browser. So I wanted to build a class with a mechanize method that takes in an url and spits out the body response, the first time a HTML, afterwards, the body response will be JSON, for remaining iterations. I have something that works but I want to know if I am doing it

Screenscaping aspx with Python Mechanize - Javascript form submission

阅读更多关于 Screenscaping aspx with Python Mechanize - Javascript form submission

问题 I'm trying to scrape UK Food Ratings Agency data aspx seach results pages (e.,g http://ratings.food.gov.uk/QuickSearch.aspx?q=po30 ) using Mechanize/Python on scraperwiki ( http://scraperwiki.com/scrapers/food_standards_agency/ ) but coming up with a problem when trying to follow "next" page links which have the form: <input type="submit" name="ctl00$ContentPlaceHolder1$uxResults$uxNext" value="Next >" id="ctl00_ContentPlaceHolder1_uxResults_uxNext" title="Next >" /> The form handler looks

Ruby Mechanize https error

阅读更多关于 Ruby Mechanize https error

问题 I'm trying to do the following: page = Mechanize.new.get "https://sis-app.sph.harvard.edu:9030/prod/bwckschd.p_disp_dyn_sched" But I only get this exception: OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: sslv3 alert illegal parameter from /Users/amosng/.rvm/gems/ruby-1.9.3-p194/gems/net-http-persistent-2.7/lib/net/http/persistent/ssl_reuse.rb:70:in `connect' from /Users/amosng/.rvm/gems/ruby-1.9.3-p194/gems/net-http-persistent-2.7/lib/net/http