mechanize | 易学教程

Using mechanize to login to a webpage

阅读更多关于 Using mechanize to login to a webpage

问题 This is my first experience in programming with Python and I'm trying to log in to this webpage. After searching around I found that many people suggested using mechanize . Just to be sure that I setup things correctly before I get to code I downloaded the mechanize zip from the website and had my python script in the unzipped mechanize folder. I have this code so far using different examples I've found: import mechanize theurl = 'http://voyager.umeres.maine.edu/Login' mech = mechanize

Scrape the absolute URL instead of a relative path in python

阅读更多关于 Scrape the absolute URL instead of a relative path in python

问题 I'm trying to get all the href's from a HTML code and store it in a list for future processing such as this: Example URL: www.example-page-xl.com <body> <section> <a href="/helloworld/index.php"> Hello World </a> </section> </body> I'm using the following code to list the href's: import bs4 as bs4 import urllib.request sauce = urllib.request.urlopen('https:www.example-page-xl.com').read() soup = bs.BeautifulSoup(sauce,'lxml') section = soup.section for url in section.find_all('a'): print(url

Python Mechanize won't open these sites

阅读更多关于 Python Mechanize won't open these sites

问题 I'm working with Python's Mechanize module. I've come across 3 different sites that cannot be opened by mechanize directly: en.wikipedia.org/wiki/Dog (new user, can't post more than 2 links T-T ) https://www.google.com/search?num=100&hl=en&site=&q=dog&oq=dog&aq=f&aqi=g10&aql=1&gs_sm=e&gs_upl=618l914l0l1027l3l2l0l0l0l0l173l173l0.1l1l0 http://www.cpsc.gov/cpscpub/prerel/prhtml03/03059.html import mechanize br = mechanize.Browser() br.set_handle_robots(False) Adding the following code allows

Python Mechanize log into Facebook cookie error

阅读更多关于 Python Mechanize log into Facebook cookie error

问题 Since a few days I cannot log into facebook anymore with my script. The Facebook login page gives the error: Cookies required, cookies are not enabled on your browser. I dont know why this error appears because I accept cookies in my script. I hope someone could help me out, I have already googled and tryed different cookie methods. import cookielib import urllib2 import mechanize br = mechanize.Browser() cookiejar = cookielib.LWPCookieJar() br.set_cookiejar( cookiejar ) br.set_handle_equiv(

Python Mechanize select a form with no name

阅读更多关于 Python Mechanize select a form with no name

问题 I am attempting to have mechanize select a form from a page, but the form in question has no "name" attribute in the html. What should I do? when I try to use br.select_form(name = "") I get errors that no form is declared with that name, and the function requires a name input. There is only one form on the page, is there some other way I can select that form? 回答1: Try: br.select_form(nr=0) to select the first form In Mechanize source, def select_form(self, name=None, predicate=None, <b>nr

Force python mechanize/urllib2 to only use A requests?

阅读更多关于 Force python mechanize/urllib2 to only use A requests?

问题 Here is a related question but I could not figure out how to apply the answer to mechanize/urllib2: how to force python httplib library to use only A requests Basically, given this simple code: #!/usr/bin/python import urllib2 print urllib2.urlopen('http://python.org/').read(100) This results in wireshark saying the following: 0.000000 10.102.0.79 -> 8.8.8.8 DNS Standard query A python.org 0.000023 10.102.0.79 -> 8.8.8.8 DNS Standard query AAAA python.org 0.005369 8.8.8.8 -> 10.102.0.79 DNS

Screen scraping: getting around “HTTP Error 403: request disallowed by robots.txt”

阅读更多关于 Screen scraping: getting around “HTTP Error 403: request disallowed by robots.txt”

问题 Is there a way to get around the following? httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Is the only way around this to contact the site-owner (barnesandnoble.com).. i'm building a site that would bring them more sales, not sure why they would deny access at a certain depth. I'm using mechanize and BeautifulSoup on Python2.6. hoping for a work-around 回答1: You can try lying about your user agent (e.g., by trying to make believe you're a human being and not a robot)

How to click a link that has javascript:__doPostBack in href?

阅读更多关于 How to click a link that has javascript:__doPostBack in href?

问题 I am writing a screen scraper script in python with module 'mechanize' and I would like to use the mechanize.click_link() method on a link that has javascript:__doPostBack in href. I believe the page I am trying to parse is using AJAX. Note: mech is the mechanize.Browser() >>> next_link.__class__.__name__ 'Link' >>> next_link Link(base_url='http://www.citius.mj.pt/Portal/consultas/ConsultasDistribuicao.aspx', url="javascript:__doPostBack('ctl00$ContentPlaceHolder1$Pager1$lnkNext','')", text=

Programmatic Python Browser with JavaScript

阅读更多关于 Programmatic Python Browser with JavaScript

问题 I want to screen-scrape a web-site that uses JavaScript. There is mechanize, the programmatic web browser for Python. However, it (understandably) doesn't interpret javascript. Is there any programmatic browser for Python which does? If not, is there any JavaScript implementation in Python that I could use to attempt to create one? 回答1: You might be better off using a tool like Selenium to automate the scraping using a web browser, so the JS executes and the page renders just like it would

Click on a javascript link within python?

阅读更多关于 Click on a javascript link within python?

问题 I am navigating a site using python's mechanize module and having trouble clicking on a javascript link for next page. I did a bit of reading and people suggested I need python-spidermonkey and DOMforms. I managed to get them installed by I am not sure of the syntax to actually click on the link. I can identify the code on the page as: <a href="javascript:__doPostBack('ctl00$MainContent$gvSearchResults','Page$2')">2</a> Does anyone know how to click on it? or if perhaps there's another tool.