Using Java to pull data from a webpage?

后端 未结 3 1977
猫巷女王i
猫巷女王i 2020-11-28 21:29

I\'m attempting to make my first program in Java. The goal is to write a program that browses to a website and downloads a file for me. However, I don\'t know how to use Jav

3条回答
  •  渐次进展
    2020-11-28 21:48

    The Basics

    Look at these to build a solution more or less from scratch:

    • Start from the basics: The Java Tutorial's chapter on Networking, including Working With URLs
    • Make things easier for yourself: Apache HttpComponents (including HttpClient)

    The Easily Glued-Up and Stitched-Up Stuff

    You always have the option of calling external tools from Java using the exec() and similar methods. For instance, you could use wget, or cURL.

    The Hardcore Stuff

    Then if you want to go into more fully-fledged stuff, thankfully the need for automated web-testing as given us very practical tools for this. Look at:

    • HtmlUnit (powerful and simple)
    • Selenium, Selenium-RC
    • WebDriver/Selenium2 (still in the works)
    • JBehave with JBehave Web

    Some other libs are purposefully written with web-scrapping in mind:

    • JSoup
    • Jaunt

    Some Workarounds

    Java is a language, but also a platform, with many other languages running on it. Some of which integrate great syntactic sugar or libraries to easily build scrappers.

    Check out:

    • Groovy (and its XmlSlurper)
    • or Scala (with great XML support as presented here and here)

    If you know of a great library for Ruby (JRuby, with an article on scraping with JRuby and HtmlUnit) or Python (Jython) or you prefer these languages, then give their JVM ports a chance.

    Some Supplements

    Some other similar questions:

    • Scrape data from HTML using Java
    • Options for HTML Scraping

提交回复
热议问题