How to scrape ajax loaded content with jsoup [closed]

℡╲_俬逩灬. 提交于 2019-12-01 21:17:18
Hemerson Varela

You can use a headless browser as PhatomJS.

PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

In order to ease your work, You could use CapserJS

CasperJS is a companion for PhatomJS which brings a greatly improved API to ease the creation of scraping and automation workflows.

These tools are very useful when you have to scrape a websites with dynamic content, for instance, websites where the content is displayed after it ran process in Javascript (sometimes including ajax calls).

You can see a example about how casper works here:
CasperJs and Jquery with chained Selects

You can't do it directly with JSoup. You'll need a headless browser, which is a much more complex thing. There are headless versions of Firefox, Safari, and others. Searches for "headless X" (where X is the browser engine you want to use) should turn up some useful projects.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!