HtmlUnit + Selenium within Production

后端 未结 3 1735
长发绾君心
长发绾君心 2021-01-04 10:27

I am currently using HtmlUnit and Selenium to drive it (WebDriver) within my production code.

I am scaping and interacting with various websites programmatically wi

3条回答
  •  余生分开走
    2021-01-04 10:55

    I'm using HtmlUnit for something similar in production and have had quite a bit of issues - mostly performance related. Currently I switched to snapshot version of HtmlUnit 2.10 where some important for me performance improvements were implemented (e.g. replacing ArrayList.contains() with HashSet.contains() on DomNode.addDomChangeListener()).

    Still, the CPU load is quite high on JavaScript-heavy pages. Typically, I can't run more than 10 of them simultaneously on dual core Linux box. I believe HtmlUnit using Rhino (JavaScript engine) in interpreter mode only, which is pretty slow. Also, you need to be careful with releasing all resources used by HtmlUnit to avoid memory leaks.

    All in all, it certainly noticeable that HtmlUnit was designed to run relatively short lived test cases and not long running server applications. It's possible to tweak it enough so it's manageable but certainly it could have been better.

    Another approach I found promising is phantom-js, which is headless version of WebKit browser, native app which is much faster on running JavaScript.

提交回复
热议问题