screen-scraping | 易学教程

Advanced HTML Agility Pack useage

阅读更多关于 Advanced HTML Agility Pack useage

问题 I am pretty new to the HTML Agility Pack so I need some help with where to go next. I can do some simple things like pull a value from an href (knowing the url string I was looking for) and I can pull like the value in a span based on a specific class that was being used. But I do not understand how to use the HTML Agility Pack in a situation where there are a ton of or tags an thre is not one real solid anchor to tie to? Here is an actual chunk of code I am scraping through. I placed dummy

screen scraping using coldfusion

阅读更多关于 screen scraping using coldfusion

问题 I am trying to screen scrape another application using the below code in Coldfusion. <cfhttp url="https://intra.att.com/itscmetrics/EM2/LTMR.cfm" method="get" username="uvwxyz" password="abcdef"> <cfhttpparam type="url" name="LTMX" value="Andre Fuetsch / Shelly K Lazzaro"> </cfhttp> <cfset myDocument = cfhttp.fileContent> <cfoutput> #myDocument# </cfoutput> Now when I run my cfm page, iam able to access the desitination page, with the above code. The destination page looks like below. A part

Selenium: How to select nth button using the same class name

阅读更多关于 Selenium: How to select nth button using the same class name

问题 I am trying to select the 3rd button using the css class "btnProceed" <input type="button" class="btnProceed" value=" " onclick="SecuritySubmit(false,'https://somewebsite.com/key=xxyyzz');return false;"> My code is as follows: WebElement query_enquirymode = driver.findElement(By.className("btnProceed")); query_enquirymode.click(); I can only select the 1st element using "btnProceed" Is there a way to select the 3rd button? 回答1: Like this: List<WebElement> buttons = driver.findElements(By

XPath not working for screen scraping

阅读更多关于 XPath not working for screen scraping

问题 I am using Scrapy for a screen scraping project and am having problems with an XPath. I am trying to get the 94,218 from the image below, but the XPaths and CSS I have used is not working. It's from this page: https://fancy.com/things/280558613/I%27m-Fine-T-Shirt I have tried multiple XPaths and CSS with Scrapy but everything is returning blank. Here are some examples: response.xpath('/html/body/div[1]/div[1]/div[1]/aside/div[1]/div/div/a[2]/text()').extract() response.xpath('//*[@id="sidebar

urllib2 returns a different page the browser does?

阅读更多关于 urllib2 returns a different page the browser does?

问题 I'm trying to scrape a page (my router's admin page) but the device seems to be serving a different page to urllib2 than to my browser. has anyone found this before? How can I get around it? this the code I'm using: >>> from BeautifulSoup import BeautifulSoup >>> import urllib2 >>> page = urllib2.urlopen("http://192.168.1.254/index.cgi?active_page=9133&active_page_str=page_bt_home&req_mode=0&mimic_button_field=btn_tab_goto:+9133..&request_id=36590071&button_value=9133") >>> soup =

Screen scraping: Automating a vim script

阅读更多关于 Screen scraping: Automating a vim script

问题 In vim, I loaded a series of web pages (one at a time) into a vim buffer (using the vim netrw plugin) and then parsed the html (using the vim elinks plugin). All good. I then wrote a series of vim scripts using regexes with a final result of a few thousand lines where each line was formatted correctly (csv) for uploading into a database. In order to do that I had to use vim's marking functionality so that I could loop over specific points of the document and reassemble it back together into

scrapy, how to separate text within a HTML tag element

阅读更多关于 scrapy, how to separate text within a HTML tag element

问题 Code containing my data: <div id="content"> <div id="content_div"> <div class="title" id="content_title_div"><img src="img/banner_outlets.jpg" width="920" height="157" alt="Outlets" /></div> <div id="menu_list"> <table border="0" cellpadding="5" cellspacing="5" width="100%"> <tbody> <tr> <td valign="top"> <p> <span class="foodTitle">Century Square</span><br /> 2 Tampines Central 5<br /> #01-44-47 Century Square<br /> Singapore 529509</p> <p>

Does httplib2 support http proxy at all? Socks proxy works but not http

阅读更多关于 Does httplib2 support http proxy at all? Socks proxy works but not http

问题 Here is my code. I cannot get any http proxy to work. Socks proxy (socks4/5) works fine though. Any ideas why? urllib2 works fine with proxies though. I am confused. Thanks.. Code : 1 import socks 2 import httplib2 3 import BeautifulSoup 4 5 httplib2.debuglevel=4 6 7 http = httplib2.Http(proxy_info = httplib2.ProxyInfo(3, '213.30.160.160', 80)) 8 9 main_url = 'http://cuil.com' 10 11 response, content = http.request(main_url, 'GET') 12 13 #html_content = BeautifulSoup(content) 14 15 print

How do I send an arrow key in Perl using the Net::Telnet module?

阅读更多关于 How do I send an arrow key in Perl using the Net::Telnet module?

问题 Using the Perl module Net::Telnet, how do you send an arrow key to a telnet session so that it would be the same thing as a user pressing the down key on the keyboard? use Net::Telnet; my $t = new Net::Telnet(); my $down_key=?; #How do you send a down key in a telnet session? t->print($down_key); This list of VT102 codes says that cursor keycodes are the following: Up: Esc [ A 033 133 101 Down: Esc [ B 033 133 102 Right: Esc [ C 033 133 103 Left: Esc [ D 033 133 104 How would I send these in

How to scrape websites such as Hype Machine?

阅读更多关于 How to scrape websites such as Hype Machine?

问题 I'm curious about website scraping (i.e. how it's done etc..), specifically that I'd like to write a script to perform the task for the site Hype Machine. I'm actually a Software Engineering Undergraduate (4th year) however we don't really cover any web programming so my understanding of Javascript/RESTFul API/All things Web are pretty limited as we're mainly focused around theory and client side applications. Any help or directions greatly appreciated. 回答1: The first thing to look for is