Query a search widget using jsoup

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-12 02:53:48

问题


I want to query the below site and get all the result in to a csv file:

http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget

I already have a program for this(which was written by the previous programmer and I am trying to understand the code as I am a beginner in jsoup and web crawling) , but now the site is updated and the query no longer works. I think I need to update the URL. Below is the url string I am currently using:

private final static String URL = "http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget?"
        + "client=default"
        + "&proxystylesheet=default"
        + "&output=xml_no_dtd"
        + "&Process=continue"
        + "&FLAT_TYPE=%s"
        + "&NME_NEWTOWN=%s"
        + "&NME_STREET="
        + "&NUM_BLK_FROM="
        + "&NUM_BLK_TO="
        + "&AMT_RESALE_PRICE_FROM="
        + "&AMT_RESALE_PRICE_TO="
        + "&DTE_APPROVAL_FROM=%s"
        + "&DTE_APPROVAL_TO=%s";

And the I connect like this:

Document doc = Jsoup.connect(url).get();

I want to update it to use the new URL. I checked in the page source, but could not find it. Can anybody please help me to find the URL f the that I need to pass here.


回答1:


To figure out the way a site works you can open Firebug or Chrome Developer Tools and inspect the network traffic. There you can inspect what is sent over the wire (data, GET or POST, cookies, ...).

For this site you will need to post the data, but you will also need to have a couple of cookies set, else the site won't accept your POST request. You can do this by simply sending a GET request first and read the cookies:

Response res = Jsoup
    .connect("http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget")
    .timeout(10000) // edit: set timeout to 10 seconds
    .method(GET)
    .execute();

Map<String,String> cookies = res.cookies();

Now you can send your POST request using the cookies:

Document doc = Jsoup
   .connect("http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget")
   .timeout(10000) // edit: set timeout to 10 seconds
   .data("FLAT_TYPE", "02")
   .data("NME_NEWTOWN", "BD      Bedok")
   .data("NME_STREET", "")
   .data("NUM_BLK_FROM", "")
   .data("NUM_BLK_TO", "")
   .data("dteRange", "12")
   .data("DTE_APPROVAL_FROM", "Apr 2015")
   .data("DTE_APPROVAL_TO", "Apr 2016")
   .data("AMT_RESALE_PRICE_FROM", "")
   .data("AMT_RESALE_PRICE_TO", "")
   .data("Process", "continue")
   .cookies(cookies)
   .post();

And use doc to scrape the search results.

Note: sending a GET request with the URL-encoded data didn't work for me



来源:https://stackoverflow.com/questions/36860158/query-a-search-widget-using-jsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!