mechanize | 易学教程

How do I parse an HTML table with Nokogiri?

阅读更多关于 How do I parse an HTML table with Nokogiri?

问题 I installed Ruby and Mechanize. It seems to me that it is posible in Nokogiri to do what I want to do but I do not know how to do it. What about this table ? It is just part of the HTML of a vBulletin forum site. I tried to keep the HTML structure but delete some text and tag attributes. I want to get some details per thread like: Title, Author, Date, Time, Replies, and Views. Please note that there are few tables in the HTML document? I am after one particular table with its tbody , <tbody

Using Python and Mechanize to submit form data and authenticate

阅读更多关于 Using Python and Mechanize to submit form data and authenticate

问题 I want to submit login to the website Reddit.com, navigate to a particular area of the page, and submit a comment. I don't see what's wrong with this code, but it is not working in that no change is reflected on the Reddit site. import mechanize import cookielib def main(): #Browser br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True

How to avoid HTTP error 429 (Too Many Requests) python

阅读更多关于 How to avoid HTTP error 429 (Too Many Requests) python

问题 I am trying to use Python to login to a website and gather information from several webpages and I get the following error: Traceback (most recent call last): File "extract_test.py", line 43, in <module> response=br.open(v) File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open return self._mech_open(url, data, timeout=timeout) File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open raise response mechanize._response

how can i extract special kind of table from website in perl?

阅读更多关于 how can i extract special kind of table from website in perl?

问题 I am trying to fetch all tables from the website http://finance.yahoo.com/etf/lists/?bypass=true&mod_id=mediaquotesetf&tab=tab1&scol=imkt&stype=desc&rcnt=50&page=1, using Perl module HTML::TableExtract, but I can't get the desired table; instead I get the first two tables only, which are useless to me. Here is my code: #!/usr/bin/perl #!perl -w use DBI; use strict; use WWW::Mechanize; use HTML::TableExtract; my $mech= WWW::Mechanize->new(); my $url= 'http://finance.yahoo.com/etf/lists/?bypass

python, mechanize - open a text file with mechanize

阅读更多关于 python, mechanize - open a text file with mechanize

问题 I am learning mechanzie. I am trying to open a text file , the link that you would click on says Text (.prn) One problem i am having is there is only 1 form on this page and the file is not in the form. Another problem for me is there are a couple Text files on this page, but they all have the same name Text (.prn). So i guess i need to get to the first one and open it. One thing that makes the text file I am trying to open unique is that it seems to be named Summary , maybe i can use this to

Python Mechanize to check if a server is available

阅读更多关于 Python Mechanize to check if a server is available

问题 I'm trying to write a script which will read a file containing some urls and then open a browser instance using mechanize module. I'm just wondering how I can do so if some url does not exist or if the server is unreachable. For Example import mechanize br = mechanize.Browser() b = br.open('http://192.168.1.30/index.php') What I want to know is how I will get information from mechanize if 192.168.1.30 is unreachable or if http returns 404 Error. 回答1: from mechanize import Browser browser =

Multiprocessing in python/beautifulsoup issues

阅读更多关于 Multiprocessing in python/beautifulsoup issues

问题 Hi guys i'm fairly new in python. what i'm trying to do is to move my old code into multiprocessing however i'm facing some errors that i hope anyone could help me out. My code is used to check a few thousand links given in a text form to check for certain tags. Once found it will output it to me. Due to the reason i have a few thousand links to check, speed is an issue and hence the need for me to move to multi processing. Update: i'm having return errors of HTTP 503 errors. Am i sending too

mechanize print to pdf [duplicate]

阅读更多关于 mechanize print to pdf [duplicate]

问题 This question already has an answer here : Closed 7 years ago . Possible Duplicate: How do I grab a thumbnail screenshot of many websites? I wrote a script using perl mechanize to login and fetch a page. How can I "print" that page to "pdf" directly from my perl script? I'd like to save a snapshot of how it looks in the browser. I can get the html using $mech->content(); 回答1: Check out wkhtmltopdf - there are variants for PDF and images (PNG etc). It's basically a command-line tool wrapping

Ruby Mechanize not returning Javascript built page correctly

阅读更多关于 Ruby Mechanize not returning Javascript built page correctly

问题 I'm trying to create a script to fill out a multi-page "form" that I have to fill out weekly (unemployment form actually), the 4th page ends up giving you a Checkbox and 2 Radio Buttons, all built by Javascript. When I navigate to this page using Mechanize I get html back without those 3 controls so I can't go any farther in the process. Is this a common problem? I'm filling out the form then just calling page = agent.submit(form, form.buttons.first) and it comes back without those controls

Use Ruby Mechanize to scrape all successive pages

阅读更多关于 Use Ruby Mechanize to scrape all successive pages

问题 I'm looking for assistance on the best way to loop through successive pages on a website while scraping relevant data off of each page. For example, I want to go to a specific site (craigslist in below example), scrape the data from the first page, go to the next page, scrape all relevant data, etc, until the very last page. In my script I'm using a while loop since it seemed to make the most sense to me. However, it doesn't appear to be working properly and is only scraping data from the