Javascript (and HTML rendering) engine without a GUI for automation?

南楼画角 提交于 2019-12-03 08:53:16

PhantomJS and PyPhantomJS are what I use for tasks like these.

What it is, is a headless WebKit based browser which is fully controllable via JavaScript. There's a C++ implementation (PhantomJS) and a Python one (PyPhantomJS). I prefer the Python one though, because it has a plugin system which allows you to add functionality to the core without actually modifying any code, unlike the C++ one. :)

There is an absolute ton of free software technology now available: take your pick at http://wiki.python.org/moin/WebBrowserProgramming but if you have specific questions join pyjamas-dev on google groups and i'll be happy to give further details, there. brief answer: you can run pywebkitgtk "headless", or you can use xulrunner (via python-hulahop) again using pygtk without actually doing "browserwidget.show()", and there's also pykhtml. also you could use python COM to connect to MSHTML.DLL.

these are all "cheat" methods: using python bindings to a graphical web browser engine without actually firing up the graphical bit. if you really wanted to put some serious hard-core programming in, you could create a "port" of webkit which was not connected to a GUI toolkit: as an experienced webkit programmer i'd put it as around... 2 weeks of full-time effort to make such a "headless" version of webkit.

l.

Looks like http://watin.sourceforge.net/ might be a good way to go.

If you don't have to go pure Python, you could do IronPython since it's a C# project.

take a look at this little doosy on ajaxian

http://ajaxian.com/archives/server-side-rendering-with-yui-on-node-js

It also talks about Aptana Jaxer which I think runs on a headless firefox so is basically the Mozilla browser engine in all it's glory.

There is Kapow. Its pure Java and costs money:

http://kapowtech.com/

And there is Lixto: Its Eclipse based and uses Mozilla Gecko as rendering engine (unless they already changed it to WebKit, as they said they'll do years ago). Its very nice and also costs money:

http://www.lixto.com/?page_id=50

They are both graphical tools where you define the site navigation and what should be extracted by point and click. But you can also write xpath and regular expressions and even JavaScript that runs in the sites context.

I used them both in the lectures web data extraction and applied web data extraction at the technical university Vienna (Lixto is written by the Professor who held the lecture).

HTMLUnit in Java is very good. I think it's only the Java implementations of headless browsers that manage to provide Javascript support.

MaxQ, I read about here, sounds like it might be interesting: "written in Java, generates Jython scripts"

Try HtmlUnit !!!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!