问题
I want to query the below site and get all the result in to a csv file:
http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget
I already have a program for this(which was written by the previous programmer and I am trying to understand the code as I am a beginner in jsoup and web crawling) , but now the site is updated and the query no longer works. I think I need to update the URL. Below is the url string I am currently using:
private final static String URL = "http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget?"
+ "client=default"
+ "&proxystylesheet=default"
+ "&output=xml_no_dtd"
+ "&Process=continue"
+ "&FLAT_TYPE=%s"
+ "&NME_NEWTOWN=%s"
+ "&NME_STREET="
+ "&NUM_BLK_FROM="
+ "&NUM_BLK_TO="
+ "&AMT_RESALE_PRICE_FROM="
+ "&AMT_RESALE_PRICE_TO="
+ "&DTE_APPROVAL_FROM=%s"
+ "&DTE_APPROVAL_TO=%s";
And the I connect like this:
Document doc = Jsoup.connect(url).get();
I want to update it to use the new URL. I checked in the page source, but could not find it. Can anybody please help me to find the URL f the that I need to pass here.
回答1:
To figure out the way a site works you can open Firebug or Chrome Developer Tools and inspect the network traffic. There you can inspect what is sent over the wire (data, GET or POST, cookies, ...).
For this site you will need to post the data, but you will also need to have a couple of cookies set, else the site won't accept your POST request. You can do this by simply sending a GET request first and read the cookies:
Response res = Jsoup
.connect("http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget")
.timeout(10000) // edit: set timeout to 10 seconds
.method(GET)
.execute();
Map<String,String> cookies = res.cookies();
Now you can send your POST request using the cookies:
Document doc = Jsoup
.connect("http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget")
.timeout(10000) // edit: set timeout to 10 seconds
.data("FLAT_TYPE", "02")
.data("NME_NEWTOWN", "BD Bedok")
.data("NME_STREET", "")
.data("NUM_BLK_FROM", "")
.data("NUM_BLK_TO", "")
.data("dteRange", "12")
.data("DTE_APPROVAL_FROM", "Apr 2015")
.data("DTE_APPROVAL_TO", "Apr 2016")
.data("AMT_RESALE_PRICE_FROM", "")
.data("AMT_RESALE_PRICE_TO", "")
.data("Process", "continue")
.cookies(cookies)
.post();
And use doc to scrape the search results.
Note: sending a GET request with the URL-encoded data didn't work for me
来源:https://stackoverflow.com/questions/36860158/query-a-search-widget-using-jsoup