htmlunit | 易学教程

Using HTMLUnit on a webpage generated by a servlet

阅读更多关于 Using HTMLUnit on a webpage generated by a servlet

问题 How could I use HTMLUnit to get data from a webpage generated by a java servlet. I keep getting an error when I try to read the webpage. /getSurvey is the servlet that creates the webpage but how can I access the HTML generated from the servlet. final WebClient webClient = new WebClient(); final HtmlPage page = webClient.getPage("http://survey-creator.appspot.com/getSurvey"); 回答1: HtmlUnit is not really "just" a HTML parser. It's kind of a programmatic webbrowser. It's intented to surf

Parsing web page containing dynamic javascript objects

阅读更多关于 Parsing web page containing dynamic javascript objects

问题 Currently I'm using python and its urllib2, urllib to retrieve a simple static web page. Everything was smooth until web-page developers added java scripts. Now the most interesting information is hidden behind the scripts: <a href="javascript://" class="event-more-view" id="view-moreid-12311" onclick="Markets.applyView(this);return false;" treeid="1291266" eventstate ="false" > add table </a> Browser preloads data and shows it when the "a href" link is clicked. The results of my short

How to limit HtmlUnit's history size?

阅读更多关于 How to limit HtmlUnit's history size?

问题 I'm using HtmlUnit for a parsing job and I've discovered that the memory gets wasted with the WebClient holding the history for each WebWindow. I don't use the history at all and I'd like to disable its management or at least limit its size with 1 or 2. Is that possible? 回答1: The following code will set ignoreNewPages_ to true: try { final WebClient webClient = getWebClient(); final List<WebWindow> webWindows = webClient.getWebWindows(); History window = webWindows.get(0).getHistory(); Field

.switchTo().frame(<'frameId'>); not working with HtmlUnit Driver

阅读更多关于 .switchTo().frame(); not working with HtmlUnit Driver

问题 I am kinda new to HtmlUnit and am having some trouble getting a "Setup" menu item which situated in the frame. Below code works perfectly fine for FireFox driver while fails for HtmlUnitDriver , HtmlUnitDriver driver = new HtmlUnitDriver(); driver.get(fleetWorkURL); WebElement usernameElement = driver.findElement(By.name("j_username")); usernameElement.sendKeys(username); WebElement passwordElement = driver.findElement(By.name("j_password")); passwordElement.sendKeys(password); WebElement

Skip particular Javascript execution in HTML unit

阅读更多关于 Skip particular Javascript execution in HTML unit

问题 I have a URL. I want to fetch Page-Source of the URL after executing Java Scripts. Fetch Page source using HtmlUnit : URL got stuck Initially I suspected that it is due to system resource and High CPU usage, that the URL is getting stuck. Then I tried to run it on HTML UNIT 2.9 and 2.11. It got stuck on both while parsing. Refer the above question for HTML UNIT code scrape that is getting stuck. Now I am suspecting that this might be due to JS Execution going into infinite loop. I want to

Selenium HtmlUnitDriver clicking on checkbox

阅读更多关于 Selenium HtmlUnitDriver clicking on checkbox

问题 I'm trying to get my checkbox clicked while running using selenium. I have no issue running my test when using chromedriver. But when I switch to HtmlUnitDriver , it will throw error when it reaches the clicking of checkbox action. The error thrown is org.openga.selenium.ElementNotVisibleException: You may only interact with visible elements I've tried multiple methods like: driver.findElement(By.xpath("//*[@id=\"chkConfirm\"]")).sendKeys(Keys.SPACE); driver.findElement(By.xpath("//*[@id=\

Can I access HTML5 storages using HTMLUnit

阅读更多关于 Can I access HTML5 storages using HTMLUnit

问题 I've a requirement where I need to identify if any page is storing or reading from HTML5 data stores. I am using HTMLUnit to scrape through webpages. I checked in the sourceforge listing that the support for HTML5 storages has been built. Does HTMLUnit actually create objects for localStorage, sessionStorage etc? If yes, how can I access them? I've also thought of scraping all Javascripts on the page and search for the keywords, but is there any better method than that? 回答1: a simple test

HtmlUnit selenium python Errno 111

阅读更多关于 HtmlUnit selenium python Errno 111

问题 I'm trying to use selenium with HtmlUnit in my Django app. This is my procedure: I start in background: java -jar selenium-server-standalone-2.27.0.jar bg I use this code: from selenium.webdriver.common.desired_capabilities import DesiredCapabilities from selenium.webdriver.remote.webdriver import WebDriver url = "www.google.com" driver = WebDriver("http://127.0.0.1:4444/wd/hub", DesiredCapabilities.HTMLUNITWITHJS) driver.get(url) text = driver.page_source ... My problem is that I get always

JavaScript Exception in HtmlUnit when clicking at google result page

阅读更多关于 JavaScript Exception in HtmlUnit when clicking at google result page

问题 I want to use HtmlUnit (v2.21) to get some search result pages from google. This requires me to click on "people also looked for" link when searching for a person (right side, see example link), which triggers some JavaScript and changes the content of the current page. But this gives me an JavaScript Wrapper Exception (see below). Clickable example link: https://www.google.de/search?ie=UTF-8&safe=off&q=nicki+minaj Simple TestCase with errors: String url = "https://www.google.de/search?ie=UTF

instantiating a webclient object in jython giving strange results

阅读更多关于 instantiating a webclient object in jython giving strange results

问题 I am trying to use java's WebClient jar within a jython script. I am running a jython script like so: jython -Dpython.path=/home/tipu/Dropbox/dev/proj/lib/* test.py the contents of test.py: import com.gargoylesoftware.htmlunit.WebClient as WebClient def main(): webclient = WebClient() # creating a new webclient object. if __name__ == '__main__': main() The error I get is: Traceback (innermost last): File "scraper.py", line 1, in ? ImportError: no module named gargoylesoftware This is the