Get html file Java

后端 未结 5 595
夕颜
夕颜 2021-01-23 21:39

Duplicate:

How do you Programmatically Download a Webpage in Java?

How to fetch html in Java

I\'m developping an application th

5条回答
  •  遇见更好的自我
    2021-01-23 22:35

    This code downloads data from a URL, treating it as binary content:

    public class Download {
    
      private static void download(URL input, File output)
          throws IOException {
        InputStream in = input.openStream();
        try {
          OutputStream out = new FileOutputStream(output);
          try {
            copy(in, out);
          } finally {
            out.close();
          }
        } finally {
          in.close();
        }
      }
    
      private static void copy(InputStream in, OutputStream out)
          throws IOException {
        byte[] buffer = new byte[1024];
        while (true) {
          int readCount = in.read(buffer);
          if (readCount == -1) {
            break;
          }
          out.write(buffer, 0, readCount);
        }
      }
    
      public static void main(String[] args) {
        try {
          URL url = new URL("http://stackoverflow.com");
          File file = new File("data");
          download(url, file);
        } catch (IOException e) {
          e.printStackTrace();
        }
      }
    
    }
    

    The downside of this approach is that it ignores any meta-data, like the Content-Type, which you would get from using HttpURLConnection (or a more sophisticated API, like the Apache one).

    In order to parse the HTML data, you'll either need a specialized HTML parser that can handle poorly formed markup or tidy it first before parsing using a XML parser.

提交回复
热议问题