htmlunit

Screen scraping with Python

你说的曾经没有我的故事 提交于 2019-11-30 02:04:13
Does Python have screen scraping libraries that offer JavaScript support? I've been using pycurl for simple HTML requests, and Java's HtmlUnit for more complicated requests requiring JavaScript support. Ideally I would like to be able to do everything from Python, but I haven't come across any libraries that would allow me to do it. Do they exist? hoju There are many options when dealing with static HTML, which the other responses cover. However if you need JavaScript support and want to stay in Python I recommend using webkit to render the webpage (including the JavaScript) and then examine

How to make 2 HtmlUnit's WebClients use same cookies?

眉间皱痕 提交于 2019-11-29 16:04:39
If I create 2 WebClients in different threads, how do I make them use the same cookies? You can use the below code: CookieManager cookieManager = new CookieManager(); webClient1.setCookieManager(cookieManager); webClient2.setCookieManager(cookieManager); 来源: https://stackoverflow.com/questions/3043745/how-to-make-2-htmlunits-webclients-use-same-cookies

How can I add cookies to HtmlUnit request header?

南笙酒味 提交于 2019-11-29 15:40:21
问题 I'm trying to access a site and I'm having trouble adding the "Cookie" collected to outgoing POST request header. I've been able to verify that they are present in the CookieManager. Any alternative means to HtmlUnit would also be appreciated. public static void main( String[] args ) { // Turn off logging to prevent polluting the output. Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF); try { final WebClient webClient = new WebClient(BrowserVersion.CHROME); webClient

Htmlunit on Android application

落花浮王杯 提交于 2019-11-29 13:26:33
Has anybody gotten HTMLUnit (or HtmlUnitDriver) to work on Android apps? This is the problem : I am getting the following error message: 11-26 16:27:26.617: E/AndroidRuntime(1265): java.lang.NoClassDefFoundError: org/w3c/dom/css/CSSRule This is what I did: I tried adding adding references to the jars listed in the following link (under both Project Dependencies and Project Transitive Dependencies - compile only, excluding test jars): http://htmlunit.sourceforge.net/dependencies.html However Eclipse kept crashing, then I found a few questions saying some jars are already contained in the

How do I perform Web Scraping in Android? [closed]

。_饼干妹妹 提交于 2019-11-29 13:02:41
I want to scrape my website and then use the data from the website to populate elements in my app, my website has login pages and certain pages only open after the login has been done. I started working with HtmlUnit as it is a headless browser and completed the custom api in a java IDE, later i tried to use the jar i generated from the java IDE and found that there are incompatibility issues with HtmlUnit and Android. Can anyone propose a solution to this problem? Edit : Since no one actually answered this question I am currently going with a work around using android's native WebView,

HtmlUnit forbid external requests

非 Y 不嫁゛ 提交于 2019-11-29 12:05:58
I use HtmlUnit for automated tests for my site. My site use gmaps api - and it takes a lot of time to send request for external site ( I have few hundreds of tests and few thousands of page loads). I need some way to tell HtmlUnit to load only local pages (stored in IIS express), and forbit loading external resources to make my tests running more quickly. You can prevent HTMLUnit from accessing certain URL's using as WebConnectionWrapper : browser.setWebConnection(new WebConnectionWrapper(browser) { @Override public WebResponse getResponse(final WebRequest request) throws IOException { if (<

How do I click a javascript button with htmlunit?

给你一囗甜甜゛ 提交于 2019-11-29 11:54:28
I'm working on an application that will automatically click a button on a webpage using htmlunit in Java. Only problem is that that button is a javascript button, so the standard getInputByName() won't work. Any suggestions with dealing with this? The code for the button is included below. <a class="vote_1" id="1537385" href="/javascript%3Avoid%280%29/index"><img src="/images/parts/btn-vote.gif" alt="Btn-vote" /></a> In addition, here's the other code for voting. <div id="content"><script type="text/javascript" src="/js/scriptFeeds/voteArticle.js"></script> Which leads to the following

HtmlUnit button click

假装没事ソ 提交于 2019-11-29 11:25:48
I'm trying to send a message on www.meetme.com but can't figure out how to do it. I can type in the message in the comment area but clicking the Send button doesn't do anything. What am I doing wrong? When I login and press the Login button the page does change and everything is fine. Anyone have any ideas or clues? HtmlPage htmlPage = null; HtmlElement htmlElement; WebClient webClient = null; HtmlButton htmlButton; HtmlForm htmlForm; try{ // Create and initialize WebClient object webClient = new WebClient(BrowserVersion.FIREFOX_17 ); webClient.setCssEnabled(false); webClient

How to use HtmlUnit in Java?

折月煮酒 提交于 2019-11-29 08:19:28
问题 I'm trying to use HtmlUnit in Java to log into a website. First i enter the user name then password. After that i need to select an option from a dropdown box. entering the user and password seemed to have worked but when i try to select the item from the drop down box i get errors. Can anyone help me fix this? My code is as follows: import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlElement; import com.gargoylesoftware.htmlunit.html.HtmlOption;

HTMLUnit : super slow execution?

北城余情 提交于 2019-11-29 08:04:53
问题 I have been using HTMLUnit . It suits my requirements well. But it seems to be extremely slow. for example : I have automated the following scenario using HTMLUnit Goto Google page Enter some text Click on the search button Get the title of the results page Click on the first result. Code : long t1=System.currentTimeMillis(); Logger logger=Logger.getLogger(""); logger.setLevel(Level.OFF); WebClient webClient=createWebClient(); WebRequest webReq=new WebRequest(new URL("http://google.lk"));