screen-scraping

Advanced HTML Agility Pack useage

安稳与你 提交于 2019-12-11 01:20:48
问题 I am pretty new to the HTML Agility Pack so I need some help with where to go next. I can do some simple things like pull a value from an href (knowing the url string I was looking for) and I can pull like the value in a span based on a specific class that was being used. But I do not understand how to use the HTML Agility Pack in a situation where there are a ton of or tags an thre is not one real solid anchor to tie to? Here is an actual chunk of code I am scraping through. I placed dummy

screen scraping using coldfusion

那年仲夏 提交于 2019-12-10 23:56:53
问题 I am trying to screen scrape another application using the below code in Coldfusion. <cfhttp url="https://intra.att.com/itscmetrics/EM2/LTMR.cfm" method="get" username="uvwxyz" password="abcdef"> <cfhttpparam type="url" name="LTMX" value="Andre Fuetsch / Shelly K Lazzaro"> </cfhttp> <cfset myDocument = cfhttp.fileContent> <cfoutput> #myDocument# </cfoutput> Now when I run my cfm page, iam able to access the desitination page, with the above code. The destination page looks like below. A part

Selenium: How to select nth button using the same class name

本秂侑毒 提交于 2019-12-10 23:44:07
问题 I am trying to select the 3rd button using the css class "btnProceed" <input type="button" class="btnProceed" value=" " onclick="SecuritySubmit(false,'https://somewebsite.com/key=xxyyzz');return false;"> My code is as follows: WebElement query_enquirymode = driver.findElement(By.className("btnProceed")); query_enquirymode.click(); I can only select the 1st element using "btnProceed" Is there a way to select the 3rd button? 回答1: Like this: List<WebElement> buttons = driver.findElements(By

XPath not working for screen scraping

走远了吗. 提交于 2019-12-10 23:36:37
问题 I am using Scrapy for a screen scraping project and am having problems with an XPath. I am trying to get the 94,218 from the image below, but the XPaths and CSS I have used is not working. It's from this page: https://fancy.com/things/280558613/I%27m-Fine-T-Shirt I have tried multiple XPaths and CSS with Scrapy but everything is returning blank. Here are some examples: response.xpath('/html/body/div[1]/div[1]/div[1]/aside/div[1]/div/div/a[2]/text()').extract() response.xpath('//*[@id="sidebar

urllib2 returns a different page the browser does?

 ̄綄美尐妖づ 提交于 2019-12-10 23:35:17
问题 I'm trying to scrape a page (my router's admin page) but the device seems to be serving a different page to urllib2 than to my browser. has anyone found this before? How can I get around it? this the code I'm using: >>> from BeautifulSoup import BeautifulSoup >>> import urllib2 >>> page = urllib2.urlopen("http://192.168.1.254/index.cgi?active_page=9133&active_page_str=page_bt_home&req_mode=0&mimic_button_field=btn_tab_goto:+9133..&request_id=36590071&button_value=9133") >>> soup =

Screen scraping: Automating a vim script

筅森魡賤 提交于 2019-12-10 22:54:15
问题 In vim, I loaded a series of web pages (one at a time) into a vim buffer (using the vim netrw plugin) and then parsed the html (using the vim elinks plugin). All good. I then wrote a series of vim scripts using regexes with a final result of a few thousand lines where each line was formatted correctly (csv) for uploading into a database. In order to do that I had to use vim's marking functionality so that I could loop over specific points of the document and reassemble it back together into

scrapy, how to separate text within a HTML tag element

久未见 提交于 2019-12-10 20:14:08
问题 Code containing my data: <div id="content"><!-- InstanceBeginEditable name="EditRegion3" --> <div id="content_div"> <div class="title" id="content_title_div"><img src="img/banner_outlets.jpg" width="920" height="157" alt="Outlets" /></div> <div id="menu_list"> <table border="0" cellpadding="5" cellspacing="5" width="100%"> <tbody> <tr> <td valign="top"> <p> <span class="foodTitle">Century Square</span><br /> 2 Tampines Central 5<br /> #01-44-47 Century Square<br /> Singapore 529509</p> <p>

Does httplib2 support http proxy at all? Socks proxy works but not http

99封情书 提交于 2019-12-10 19:44:40
问题 Here is my code. I cannot get any http proxy to work. Socks proxy (socks4/5) works fine though. Any ideas why? urllib2 works fine with proxies though. I am confused. Thanks.. Code : 1 import socks 2 import httplib2 3 import BeautifulSoup 4 5 httplib2.debuglevel=4 6 7 http = httplib2.Http(proxy_info = httplib2.ProxyInfo(3, '213.30.160.160', 80)) 8 9 main_url = 'http://cuil.com' 10 11 response, content = http.request(main_url, 'GET') 12 13 #html_content = BeautifulSoup(content) 14 15 print

How do I send an arrow key in Perl using the Net::Telnet module?

醉酒当歌 提交于 2019-12-10 19:34:55
问题 Using the Perl module Net::Telnet, how do you send an arrow key to a telnet session so that it would be the same thing as a user pressing the down key on the keyboard? use Net::Telnet; my $t = new Net::Telnet(); my $down_key=?; #How do you send a down key in a telnet session? t->print($down_key); This list of VT102 codes says that cursor keycodes are the following: Up: Esc [ A 033 133 101 Down: Esc [ B 033 133 102 Right: Esc [ C 033 133 103 Left: Esc [ D 033 133 104 How would I send these in

How to scrape websites such as Hype Machine?

那年仲夏 提交于 2019-12-10 18:47:01
问题 I'm curious about website scraping (i.e. how it's done etc..), specifically that I'd like to write a script to perform the task for the site Hype Machine. I'm actually a Software Engineering Undergraduate (4th year) however we don't really cover any web programming so my understanding of Javascript/RESTFul API/All things Web are pretty limited as we're mainly focused around theory and client side applications. Any help or directions greatly appreciated. 回答1: The first thing to look for is