mechanize | 易学教程

Ruby Mechanize login not working

阅读更多关于 Ruby Mechanize login not working

问题 Let me set the stage for what I'm trying to accomplish. In a physics class I'm taking, my teacher always likes to brag about how impossible it is to cheat in her class, because all of her assignments are done through WebAssign. The way WebAssign works is this: Everyone gets the same questions, but the numbers used in the question are random variables, so each student has different numbers, thus a different answer. So I've been writing ruby scripts to solve the question's for people by just

Python mechanize form submitting doesn't work

阅读更多关于 Python mechanize form submitting doesn't work

I am trying to write a simple bot that would login to my account on a page and then comment other users' images. However I am not able to get the comment form submitting work correctly. The comment form looks like this: <form id="comment-form" action="#" onsubmit="postComment($(this).serialize(),'image',117885,229227); return false;"> <input class="comment" type="text" size="40" name="comment" id="comment" /> <input type="hidden" name="commentObj" value="9234785" /> <input type="hidden" name="commentMode" value="image" /> <input type="hidden" name="userid" value="12427" /> <input class="submit

is using threads and ruby mechanize safe?

阅读更多关于 is using threads and ruby mechanize safe?

Does anyone ever see a lot of errors like this: Exception `Net::HTTPBadResponse' at /usr/lib/ruby/1.8/net/http.rb:2022 - wrong status line: _SOME HTML CODE HERE_ When using threads and mechanize? I'm relatively certain that this is some bad behavior between threads and the net/http library, but does anyone have any advice as far as the upper limit of threads you want to run at once when using mechanize/nethttp? And how can I capture this kind of exception because rescue Net::HTTPBadResponse doesn't work? This could be something non-thread-safe in Mechanize, but I can think of other bugs that

set ruby logger.progname on per thread basis

阅读更多关于 set ruby logger.progname on per thread basis

问题 I have this : class Stress def initialize(user, pass) @user = user @pass = pass @agent = Mechanize.new do |a| a.user_agent_alias = 'Windows Mozilla' a.history.max_size = 0 a.log = my_log a.log.progname = @user end end def browse @agent.log.progname = @user # open/close page end end my_log = Logger.new('dump.log') my_log.level = Logger::DEBUG atom = Mutex.new for i in (Attempts_start..Attempts_end) threads << Thread.new(Creden_base + i.to_s) do |user| stress = Stress.new(user, user) for j in

What pure Python library should I use to scrape a website?

阅读更多关于 What pure Python library should I use to scrape a website?

问题 I currently have some Ruby code used to scrape some websites. I was using Ruby because at the time I was using Ruby on Rails for a site, and it just made sense. Now I'm trying to port this over to Google App Engine, and keep getting stuck. I've ported Python Mechanize to work with Google App Engine, but it doesn't support DOM inspection with XPATH. I've tried the built-in ElementTree, but it choked on the first HTML blob I gave it when it ran into '&mdash'. Do I keep trying to hack

How to extract text from <script> tag by using nokogiri and mechanize?

阅读更多关于 How to extract text from tag by using nokogiri and mechanize?

this is a part of the source code of a bookings web site: <script> booking.ensureNamespaceExists('env'); booking.env.b_map_center_latitude = 53.36480155016638; booking.env.b_map_center_longitude = -2.2752803564071655; booking.env.b_hotel_id = '35523'; booking.env.b_query_params_no_ext = '?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaFCIAQGYAS64AQTIAQTYAQHoAQH4AQs;sid=e1c9e4c7a000518d8a3725b9bb6e5306;dcid=1'; </script> And I want to extract booking.env.b_hotel_id . So that i would get the value of '25523'. How do I achieve this with nokogiri and mechanize? Hope somebody can help! thanks! :) Jason

Downloading pdf files using mechanize and urllib

阅读更多关于 Downloading pdf files using mechanize and urllib

I am new to Python, and my current task is to write a web crawler that looks for PDF files in certain webpages and downloads them. Here's my current approach (just for 1 sample url): import mechanize import urllib import sys mech = mechanize.Browser() mech.set_handle_robots(False) url = "http://www.xyz.com" try: mech.open(url, timeout = 30.0) except HTTPError, e: sys.exit("%d: %s" % (e.code, e.msg)) links = mech.links() for l in links: #Some are relative links path = str(l.base_url[:-1])+str(l.url) if path.find(".pdf") > 0: urllib.urlretrieve(path) The program runs without any errors, but I am

Web Scraper: Limit to Requests Per Minute/Hour on Single Domain?

阅读更多关于 Web Scraper: Limit to Requests Per Minute/Hour on Single Domain?

问题 I'm working with a librarian to re-structure his organization's digital photography archive. I've built a Python robot with Mechanize and BeautifulSoup to pull about 7000 poorly structured and mildy incorrect/incomplete documents from a collection. The data will be formatted for a spreadsheet he can use to correct it. Right now I'm guesstimating 7500 HTTP requests total to build the search dictionary and then harvest the data, not counting mistakes and do-overs in my code, and then many more

UnicodeDecodeError problem with mechanize [duplicate]

阅读更多关于 UnicodeDecodeError problem with mechanize [duplicate]

问题 This question already has answers here : How to determine the encoding of text? (9 answers) Closed 2 years ago . I receive the following string from one website via mechanize: 'We\x92ve' I know that \x92 stands for ’ character. I'm trying to convert that string to Unicode: >> unicode('We\x92ve','utf-8') UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 2: unexpected code byte What am I doing wrong? Edit: The reason I tried 'utf-8' was this: >> response = browser.response() >

Python Mechanize Browser: HTTP Error 460

阅读更多关于 Python Mechanize Browser: HTTP Error 460

I am trying to log into a site using a mechanize browser and getting an HTTP 460 Error which appears to be a made up error so I'm not sure what to make of it. Here's the code: # Browser br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9