Alternative to HtmlUnit

给你一囗甜甜゛ 提交于 2019-12-03 04:55:57

问题


I have been researching about the headless browsers available till to date and found HtmlUnit being used pretty extensively. Do we have any alternative to HtmlUnit with possible advantage compared to HtmlUnit?

Thanks Nayn


回答1:


As far as I know, HtmlUnit` is the most powerful headless browser.

What are you issues with it?




回答2:


There are many other libraries that you can use for this.

  • If you need to scrape xml base data use JTidy.
  • If you need to scrape specific data from HTML you can use Jsoup.

Well I use jsoup - it's pretty much faster than any other API.




回答3:


WebDriver with a virtual framebuffer is the only real alternative. The advantage is that it uses a real browser; the disadvantage is that it's more of a pain to set up, and the API is much poorer.




回答4:


I am going to use Selenium for my use case, since it offers me to use the real browser and no deviation from what it would render in real world as compared to HtmlUnit. I am planning to use Selenium2 which has WebDriver integration and offers great API and cool fixes. Thanks Nayn




回答5:


I use webkit as a headless browser, through Qt's Python bindings: http://www.riverbankcomputing.co.uk/static/Docs/PyQt4/html/qtwebkit.html

Webkit is the render engine used by Chrome and Safari, and is very flexible.

One of my reasons for choosing it over HtmlUnit was ease of setting up:

sudo apt-get install python-qt4



回答6:


I would also recommend Selenium. The great feature is you can create a client that opens a browser page that you can see what's happening at each step. Moreover, creating macros for automated tests is another good feature. However, if you need to scrap some information from web page HtmlUnit is better than selenium.



来源:https://stackoverflow.com/questions/4253670/alternative-to-htmlunit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!