Fetch contents(loaded through AJAX call) of a web page

后端 未结 2 1510
-上瘾入骨i
-上瘾入骨i 2020-11-28 13:23

I am a beginner to crawling. I have a requirement to fetch the posts and comments from a link. I want to automate this process. I considered using webcrawler and jsoup for t

相关标签:
2条回答
  • 2020-11-28 13:34

    Jsoup does not handle with Javascript and Ajax, so you need to use Htmlunit or selenium. After loading page using Htmlunit or any you can use jsoup for rest of task.

    0 讨论(0)
  • 2020-11-28 13:52

    Jsoup is a html parser only. Unfortunately it's not possible to parse any javascript / ajax content, since jsoup can't execute those.

    The solution: using a library which can handle Scripts.

    Here are some examples i know:

    • HtmlUnit
    • Java Script Engine
    • Apache Commons BSF
    • Rhino

    If such a library doesn't support parsing or selectors, you can at least use them to get Html out of the scripts (which then can be parsed by jsoup).

    0 讨论(0)
提交回复
热议问题