mechanize

Using mechanize to login to a webpage

坚强是说给别人听的谎言 提交于 2019-12-18 12:37:48
问题 This is my first experience in programming with Python and I'm trying to log in to this webpage. After searching around I found that many people suggested using mechanize . Just to be sure that I setup things correctly before I get to code I downloaded the mechanize zip from the website and had my python script in the unzipped mechanize folder. I have this code so far using different examples I've found: import mechanize theurl = 'http://voyager.umeres.maine.edu/Login' mech = mechanize

Scrape the absolute URL instead of a relative path in python

可紊 提交于 2019-12-18 05:43:40
问题 I'm trying to get all the href's from a HTML code and store it in a list for future processing such as this: Example URL: www.example-page-xl.com <body> <section> <a href="/helloworld/index.php"> Hello World </a> </section> </body> I'm using the following code to list the href's: import bs4 as bs4 import urllib.request sauce = urllib.request.urlopen('https:www.example-page-xl.com').read() soup = bs.BeautifulSoup(sauce,'lxml') section = soup.section for url in section.find_all('a'): print(url

Python Mechanize won't open these sites

孤者浪人 提交于 2019-12-18 04:54:08
问题 I'm working with Python's Mechanize module. I've come across 3 different sites that cannot be opened by mechanize directly: en.wikipedia.org/wiki/Dog (new user, can't post more than 2 links T-T ) https://www.google.com/search?num=100&hl=en&site=&q=dog&oq=dog&aq=f&aqi=g10&aql=1&gs_sm=e&gs_upl=618l914l0l1027l3l2l0l0l0l0l173l173l0.1l1l0 http://www.cpsc.gov/cpscpub/prerel/prhtml03/03059.html import mechanize br = mechanize.Browser() br.set_handle_robots(False) Adding the following code allows

Python Mechanize log into Facebook cookie error

一笑奈何 提交于 2019-12-18 02:50:53
问题 Since a few days I cannot log into facebook anymore with my script. The Facebook login page gives the error: Cookies required, cookies are not enabled on your browser. I dont know why this error appears because I accept cookies in my script. I hope someone could help me out, I have already googled and tryed different cookie methods. import cookielib import urllib2 import mechanize br = mechanize.Browser() cookiejar = cookielib.LWPCookieJar() br.set_cookiejar( cookiejar ) br.set_handle_equiv(

Python Mechanize select a form with no name

孤人 提交于 2019-12-18 01:30:13
问题 I am attempting to have mechanize select a form from a page, but the form in question has no "name" attribute in the html. What should I do? when I try to use br.select_form(name = "") I get errors that no form is declared with that name, and the function requires a name input. There is only one form on the page, is there some other way I can select that form? 回答1: Try: br.select_form(nr=0) to select the first form In Mechanize source, def select_form(self, name=None, predicate=None, <b>nr

Force python mechanize/urllib2 to only use A requests?

女生的网名这么多〃 提交于 2019-12-17 22:42:39
问题 Here is a related question but I could not figure out how to apply the answer to mechanize/urllib2: how to force python httplib library to use only A requests Basically, given this simple code: #!/usr/bin/python import urllib2 print urllib2.urlopen('http://python.org/').read(100) This results in wireshark saying the following: 0.000000 10.102.0.79 -> 8.8.8.8 DNS Standard query A python.org 0.000023 10.102.0.79 -> 8.8.8.8 DNS Standard query AAAA python.org 0.005369 8.8.8.8 -> 10.102.0.79 DNS

Screen scraping: getting around “HTTP Error 403: request disallowed by robots.txt”

萝らか妹 提交于 2019-12-17 21:44:11
问题 Is there a way to get around the following? httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Is the only way around this to contact the site-owner (barnesandnoble.com).. i'm building a site that would bring them more sales, not sure why they would deny access at a certain depth. I'm using mechanize and BeautifulSoup on Python2.6. hoping for a work-around 回答1: You can try lying about your user agent (e.g., by trying to make believe you're a human being and not a robot)

How to click a link that has javascript:__doPostBack in href?

筅森魡賤 提交于 2019-12-17 15:51:56
问题 I am writing a screen scraper script in python with module 'mechanize' and I would like to use the mechanize.click_link() method on a link that has javascript:__doPostBack in href. I believe the page I am trying to parse is using AJAX. Note: mech is the mechanize.Browser() >>> next_link.__class__.__name__ 'Link' >>> next_link Link(base_url='http://www.citius.mj.pt/Portal/consultas/ConsultasDistribuicao.aspx', url="javascript:__doPostBack('ctl00$ContentPlaceHolder1$Pager1$lnkNext','')", text=

Programmatic Python Browser with JavaScript

空扰寡人 提交于 2019-12-17 15:35:48
问题 I want to screen-scrape a web-site that uses JavaScript. There is mechanize, the programmatic web browser for Python. However, it (understandably) doesn't interpret javascript. Is there any programmatic browser for Python which does? If not, is there any JavaScript implementation in Python that I could use to attempt to create one? 回答1: You might be better off using a tool like Selenium to automate the scraping using a web browser, so the JS executes and the page renders just like it would

Click on a javascript link within python?

与世无争的帅哥 提交于 2019-12-17 10:31:41
问题 I am navigating a site using python's mechanize module and having trouble clicking on a javascript link for next page. I did a bit of reading and people suggested I need python-spidermonkey and DOMforms. I managed to get them installed by I am not sure of the syntax to actually click on the link. I can identify the code on the page as: <a href="javascript:__doPostBack('ctl00$MainContent$gvSearchResults','Page$2')">2</a> Does anyone know how to click on it? or if perhaps there's another tool.