Open a connection with Jsoup, get status code and parse document

前端 未结 4 1101
春和景丽
春和景丽 2021-01-02 19:26

I\'m creating a class using jsoup that will do the following:

  1. The constructor opens a connection to a url.
  2. I have a method that will check the status
相关标签:
4条回答
  • 2021-01-02 19:48

    You should be able to call parse() on your response object.

    Document doc = response.parse();
    
    0 讨论(0)
  • 2021-01-02 19:59

    Seems your situation like you want to make connection with jsoup then check the status code and then according to the status code you will parse or whatever you want to do.

    For this first you have to check the status code of the URL instead creating connection.

      Response response = Jsoup.connect("Your Url ").followRedirects(false).execute();
            System.out.println(response.statusCode() + " : " + response.url());
    

    response.statusCode() will return you the status code

    After that you can create your connection

     if (200 == response.statusCode()) {
            doc = Jsoup.connect(" Your URL").get();
            Elements elements = doc.select("href");
            /* what ever you want to do*/
          }
    

    Your class will look like this

    package com.demo.soup.core;
    
    import java.io.IOException;
    
    import org.jsoup.Connection.Response;
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    
    /**
     * The Class DemoConnectionWithJsoup.
     *
     * @author Ankit Sood Apr 21, 2017
     */
    public class DemoConnectionWithJsoup {
    
        /**
         * The main method.
         *
         * @param args
         *            the arguments
         */
        public static void main(String[] args) {
        Response response;
        try {
            response = Jsoup.connect("Your URL ").followRedirects(false).execute();
    
            /* response.statusCode() will return you the status code */
            if (200 == response.statusCode()) {
            Document doc = Jsoup.connect("Your URL").get();
    
            /* what ever you want to do */
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    
        }
    
    }
    
    0 讨论(0)
  • 2021-01-02 20:01

    If you don't need to login, use:

    Document doc = Jsoup.connect("url").get();
    

    If you DO need to login I'd advise using:

    Response res = Jsoup.connect("url")
        .data("loginField", "yourUser", "passwordField", "yourPassword")
        .method(Method.POST)
        .execute();
    Document doc = res.parse();
    
    //If you need to keep logged in to the page, use
    Map<String, String> cookies = res.cookies;
    
    //And by every consequent connection, you'll need to use
    Document pageWhenAlreadyLoggedIn = Jsoup.connect("url").cookies(cookies).get();
    

    In your usage to get urls I'd probably try

    Elements elems = doc.select(a[href]);
    for (Element elem : elems) {
        String link = elem.attr("href");
    }
    

    That's about it.. Keep up the good work

    0 讨论(0)
  • 2021-01-02 20:09

    As stated in the JSoup Documentation for the Connection.Response type, there is a parse() method that parse the response's body as a Document and returns it. When you have that, you can do whatever you want with it.

    For example, see the implementation of getUrls()

    public class ParsePage {
       private String path;
       Connection.Response response = null;
    
       private ParsePage(String langLocale){
          try {
             response = Jsoup.connect(path)
                .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
                .timeout(10000)
                .execute();
          } catch (IOException e) {
             System.out.println("io - "+e);
          }
       }
    
       public int getSitemapStatus() {
          int statusCode = response.statusCode();
          return statusCode;
       }
    
       public ArrayList<String> getUrls() {
          ArrayList<String> urls = new ArrayList<String>();
          Document doc = response.parse();
          // do whatever you want, for example retrieving the <url> from the sitemap
          for (Element url : doc.select("url")) {
             urls.add(url.select("loc").text());
          }
          return urls;
       }
    }
    
    0 讨论(0)
提交回复
热议问题