Java Web Crawler Libraries

前端 未结 12 1085
栀梦
栀梦 2020-12-13 04:58

I wanted to make a Java based web crawler for an experiment. I heard that making a Web Crawler in Java was the way to go if this is your first time. However, I have two impo

12条回答
  •  悲哀的现实
    2020-12-13 05:34

    This is How your program 'visit' or 'connect' to web pages.

        URL url;
        InputStream is = null;
        DataInputStream dis;
        String line;
    
        try {
            url = new URL("http://stackoverflow.com/");
            is = url.openStream();  // throws an IOException
            dis = new DataInputStream(new BufferedInputStream(is));
    
            while ((line = dis.readLine()) != null) {
                System.out.println(line);
            }
        } catch (MalformedURLException mue) {
             mue.printStackTrace();
        } catch (IOException ioe) {
             ioe.printStackTrace();
        } finally {
            try {
                is.close();
            } catch (IOException ioe) {
                // nothing to see here
            }
        }
    

    This will download source of html page.

    For HTML parsing see this

    Also take a look at jSpider and jsoup

提交回复
热议问题