screen-scraping

Scraping flash websites

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-20 02:13:41
问题 I am willing to create a script that takes information from a website which is done in flash. I was about to start coding an application doing something like: moving mouse to position x,y. do a mouse click. wait x msec. get data. My question is: Is there a better way to do this? Any lib? Thanks for reading! 回答1: Use Selenium. 回答2: there isn't a simple solution for flash web sites. selenium or watin or any other tools can't access to the flash unless you've got access the source codes of flash

Scraping flash websites

◇◆丶佛笑我妖孽 提交于 2019-12-20 02:13:20
问题 I am willing to create a script that takes information from a website which is done in flash. I was about to start coding an application doing something like: moving mouse to position x,y. do a mouse click. wait x msec. get data. My question is: Is there a better way to do this? Any lib? Thanks for reading! 回答1: Use Selenium. 回答2: there isn't a simple solution for flash web sites. selenium or watin or any other tools can't access to the flash unless you've got access the source codes of flash

Cross platform solution for automating ncurses-type telnet sessions

你离开我真会死。 提交于 2019-12-20 01:13:19
问题 Background Part of my work in networking and telco involves automating telnet sessions when legacy hardware doesn't offer easy solutions in other interfaces. Many older pieces of equipment can only be accessed via craft ports (RS-232 serial ports), SNMP, or telnet. Sometimes telnet is the only way to access specific information, however telnet is designed as a human interface and thus requires screen scraping. In addition, there is also the issue of scraping screens where only portions are

Getting HTML from a page behind a login

落花浮王杯 提交于 2019-12-19 11:52:50
问题 This question is a follow up to my previous question about getting the HTML from an ASPX page. I decided to try using the webclient object, but the problem is that I get the login page's HTML because login is required. I tried "logging in" using the webclient object: WebClient ww = new WebClient(); ww.DownloadString("Login.aspx?UserName=&Password="); string html = ww.DownloadString("Internal.aspx"); But I still get the login page all the time. I know that the username info is not stored in a

How can I query rankings for the users in my DB, but only consider the latest entry for each user?

自闭症网瘾萝莉.ら 提交于 2019-12-19 09:23:45
问题 Lets say I have a database table called "Scrape" possibly setup like: UserID (int) UserName (varchar) Wins (int) Losses (int) ScrapeDate (datetime) I'm trying to be able to rank my users based on their Wins/Loss ratio. However, each week I'll be scraping for new data on the users and making another entry in the Scrape table. How can I query a list of users sorted by wins/losses, but only taking into consideration the most recent entry (ScrapeDate)? Also, do you think it matters that people

How do I save a web page, programatically?

旧巷老猫 提交于 2019-12-19 07:11:16
问题 I would like to save a web page programmatically. I don't mean merely save the HTML. I would also like automatically to store all associated files (images, CSS files, maybe embedded SWF, etc), and hopefully rewrite the links for local browsing. The intended usage is a personal bookmarks application, in which link content is cached in case the original copy is taken down. 回答1: Take a look at wget, specifically the -p flag −p −−page−requisites This option causes Wget to download all the files

C# library similar to HtmlUnit

試著忘記壹切 提交于 2019-12-19 04:24:30
问题 I need to write standalone application which will "browse" external resource. Is there lib in C# which automatically handles cookies and supports JavaScript (through JS is not required I believe)? The main goal is to keep session alive and submitting forms so I could pass multistep registration process or "browse" web site after login. I reviewed Html Agility Pack but it looks like it doesn't contain functionality I need - form submitting or cookie support. Thanks, Artem. 回答1: Look at Data

struggling to click on link within htmlunit

百般思念 提交于 2019-12-19 03:56:09
问题 I am having a problem clicking on a link within htmlunit. I went through the api on the site(which I didn't really understand well) and looked at all the sample code I could find and am still having a problem with clicking on links. Here's the top of the error messsage(its pretty large, if you want I can submit it all) "page2 = link2.click() Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException] com.gargoylesoftware.htmlunit.ScriptException: Sys

Embedding part of a web site

倾然丶 夕夏残阳落幕 提交于 2019-12-18 16:57:20
问题 Suppose I want to embed the latest comic strip of one of my favorite webcomics into my site as a kind of promotion for it. The webcomic has the strip inside of a div with an id, so I figured I can just embed the div in my site, except that I couldn't find any code examples for how to do it (they all show how to embed flash or a whole website). Can someone please show me (or tell) how it's done? PS I'd rather not use server side scripting or external services (which is what is often

Embedding part of a web site

蓝咒 提交于 2019-12-18 16:57:09
问题 Suppose I want to embed the latest comic strip of one of my favorite webcomics into my site as a kind of promotion for it. The webcomic has the strip inside of a div with an id, so I figured I can just embed the div in my site, except that I couldn't find any code examples for how to do it (they all show how to embed flash or a whole website). Can someone please show me (or tell) how it's done? PS I'd rather not use server side scripting or external services (which is what is often