How do you Programmatically Download a Webpage in Java

前端 未结 11 2188
无人共我
无人共我 2020-11-22 11:20

I would like to be able to fetch a web page\'s html and save it to a String, so I can do some processing on it. Also, how could I handle various types of compr

11条回答
  •  眼角桃花
    2020-11-22 11:45

    You'd most likely need to extract code from a secure web page (https protocol). In the following example, the html file is being saved into c:\temp\filename.html Enjoy!

    import java.io.BufferedReader;
    import java.io.BufferedWriter;
    import java.io.FileWriter;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.net.URL;
    
    import javax.net.ssl.HttpsURLConnection;
    
    /**
     * Get the Html source from the secure url 
     */
    public class HttpsClientUtil {
        public static void main(String[] args) throws Exception {
            String httpsURL = "https://stackoverflow.com";
            String FILENAME = "c:\\temp\\filename.html";
            BufferedWriter bw = new BufferedWriter(new FileWriter(FILENAME));
            URL myurl = new URL(httpsURL);
            HttpsURLConnection con = (HttpsURLConnection) myurl.openConnection();
            con.setRequestProperty ( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0" );
            InputStream ins = con.getInputStream();
            InputStreamReader isr = new InputStreamReader(ins, "Windows-1252");
            BufferedReader in = new BufferedReader(isr);
            String inputLine;
    
            // Write each line into the file
            while ((inputLine = in.readLine()) != null) {
                System.out.println(inputLine);
                bw.write(inputLine);
            }
            in.close(); 
            bw.close();
        }
    }
    

提交回复
热议问题