How to scrape ajax loaded content with jsoup [closed]

巧了我就是萌 提交于 2019-12-02 00:47:05

问题


I have used JSOUP for scraping and its works perfectly till the ajax and javascript not playing their roles to display webpage content .

Now guys any clue , how to scrape those content which get displayed with ajax or by JavaScript after page get loads completely .

Thanks in advance !!


回答1:


You can use a headless browser as PhatomJS.

PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

In order to ease your work, You could use CapserJS

CasperJS is a companion for PhatomJS which brings a greatly improved API to ease the creation of scraping and automation workflows.

These tools are very useful when you have to scrape a websites with dynamic content, for instance, websites where the content is displayed after it ran process in Javascript (sometimes including ajax calls).

You can see a example about how casper works here:
CasperJs and Jquery with chained Selects




回答2:


You can't do it directly with JSoup. You'll need a headless browser, which is a much more complex thing. There are headless versions of Firefox, Safari, and others. Searches for "headless X" (where X is the browser engine you want to use) should turn up some useful projects.



来源:https://stackoverflow.com/questions/16852660/how-to-scrape-ajax-loaded-content-with-jsoup

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!