htmlunit

使用HtmlUnit动态获取网页数据

一个人想着一个人 提交于 2019-12-05 03:04:44
1.HtmlUnit是一个用java编写的无界面浏览器,建模html文档,通过API调用页面,填充表单,点击链接等等。如同正常浏览器一样操作。典型应用于测试以及从网页抓取信息。并且HtmlUnit拥有HttpClient和soup两者的功能,但速度比较慢,但如果取消它的解析css和js的功能,速度也会提上去,默认开启。 2.这里选用HtmlUnit来爬取数据主要是为了获取他的js和css. 3.主要代码如下 package com.los; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.DomElement; import com.gargoylesoftware.htmlunit.html.DomNodeList; import com.gargoylesoftware.htmlunit.html.HtmlPage; import com.los.util.DownlandPic; import java.io.IOException; import java.util.regex.Pattern; public class HtmlUnitTest { public static void main(String[] args) throws

HtmlUnit accessing an element without id or Name

混江龙づ霸主 提交于 2019-12-05 01:02:14
问题 How can I access this element: <input type="submit" value="Save as XML" onclick="some code goes here"> More info: I have to access programmatically a web page and simulate clicking on a button on it, which then will generate a xml file which I hope to be able to save on the local machine. I am trying to do so by using HtmlUnit libraries, but all examples I could find use getElementById() or getElementByName() methods. Unfortunately, this exact element doesn't have a name or Id, so I failed

How to ignore HTMLUnit warnings/errors related to jQuery?

我的梦境 提交于 2019-12-04 23:22:59
Is it possible to teach HTMLUnit to ignore certain javascript scripts/files on a web page? Some of them are just out of my control (like jQuery) and I can't do anything with them. Warnings are annoying, for example: [WARN] com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument: getElementById(script1299254732492) did a getElementByName for Internet Explorer Actually I'm using JSFUnit and HTMLUnit works under it. MrSmith42 If you want to avoid exceptions because of any JavaScript errors: webClient.setThrowExceptionOnScriptError(false); Well I am yet to find a way for that but I have

Restricting Selenium/Webdriver/HtmlUnit to a certain domain

江枫思渺然 提交于 2019-12-04 22:44:54
问题 While using selenium/webdriver for web scraping, I realized the target site has google analytics script running. Is there a way to restrict selenium/webdriver/htmlunit to avoid certain urls/domains ? Thanks, 回答1: I think it is impossible becouse Selenium is actually adapter for several implementation. So he can't deny to load some scripts to firefox or chrome. Perhaps you can check driver api(firefox profile, htmlunit configuration file) to accomplish this. 来源: https://stackoverflow.com

HtmlUnit ScriptException errors

早过忘川 提交于 2019-12-04 17:55:32
I am using HtmlUnitDriver,& here is my code. HtmlUnitDriver driver = new HtmlUnitDriver(true); driver.get("some url here"); I am getting following Exception: Caused by: com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: SyntaxError: missing ; before statement (http://sales.liveperson.net/hcp/html/mTag.js?site=7824460#1(eval)#1) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537) at net.sourceforge.htmlunit

.switchTo().frame(<'frameId'>); not working with HtmlUnit Driver

让人想犯罪 __ 提交于 2019-12-04 17:01:27
I am kinda new to HtmlUnit and am having some trouble getting a "Setup" menu item which situated in the frame. Below code works perfectly fine for FireFox driver while fails for HtmlUnitDriver , HtmlUnitDriver driver = new HtmlUnitDriver(); driver.get(fleetWorkURL); WebElement usernameElement = driver.findElement(By.name("j_username")); usernameElement.sendKeys(username); WebElement passwordElement = driver.findElement(By.name("j_password")); passwordElement.sendKeys(password); WebElement logInButton = driver.findElement(By.className("button_acunia")); logInButton.click(); driver.switchTo()

HtmlUnit WebClient Timeout

三世轮回 提交于 2019-12-04 14:42:59
In my previous questions about HtmlUnit Skip particular Javascript execution in HTML unit and Fetch Page source using HtmlUnit : URL got stuck I had mentioned that URL is getting stuck. I also found out that it is getting stuck due to one of the methods(parse) in HtmlUnit library is not coming out of execution. I did further work on this. I wrote code to get out of the method if it takes more than specified time-out seconds to complete. import java.io.IOException; import java.net.MalformedURLException; import java.util.Date; import java.util.concurrent.ExecutorService; import java.util

HtmlUnit Android problem with WebClient

你。 提交于 2019-12-04 12:52:55
HtmlUnit is amazing, in Java at least I have had no problems with it. Unfortunately when switching the code over to the Android platform, it is giving me errors when I try to create a web-client. import android.app.Activity; import android.os.Bundle; import com.gargoylesoftware.htmlunit.WebClient; public class AndroidTestActivity extends Activity { /** Called when the activity is first created. */ @Override public void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.main); final WebClient webClient = new WebClient(); } } Warnings it gives me

Running HtmlUnit with Jython - issue with startup on command line

﹥>﹥吖頭↗ 提交于 2019-12-04 12:16:50
I tried to run HtmlUnit with Jython following this tutorial: http://blog.databigbang.com/web-scraping-ajax-and-javascript-sites/ but it does not work for me. I am unable to import the com.gargoylesoftvare packages, there are only some HTML files in HtmlUnit folder, which I need to import somehow? The tutorial says to run python script like this: /opt/jython/jython -J-classpath "htmlunit-2.8/lib/*" gartner.py and I try to run: java -jar /Users/adam/jython/jython.jar -J-classpath "htmlunit-2.8/lib/*" gartner.py My problem is I am getting an "Unknown option: J-classpath". But there is not even