Why is InputStreamReader returning different content than browser? [closed]

吃可爱长大的小学妹 提交于 2019-12-12 07:02:25

问题


If you enter this in a browser url:

https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&AUCTIONDATE=07/16/2019

It returns a lot of data. But if I try to capture that data with an Input StreamReader, the only data returned is

{"retHTML":"", "rlist":""}

Here is the program:

List<Property> scrapePropertyInfo(List<Date> auctionDates) {
    List<Property> properties = new ArrayList<>();
    String urlStr = "https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&AUCTIONDATE=07/16/2019";
    String str = null;
    try {
        URL url = new URL(urlStr);
        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
        StringBuilder stringBuilder = new StringBuilder();
        while ((str = in.readLine()) != null) {
            stringBuilder.append(str);
        }
        System.out.println("Url: "+urlStr);
        System.out.println(stringBuilder.toString());
        in.close();
    } catch (MalformedURLException ex) {
        Logger.getLogger(CharlotteCtyFL.class.getName()).log(Level.SEVERE, null, ex);
    } catch (IOException ex) {
        Logger.getLogger(CharlotteCtyFL.class.getName()).log(Level.SEVERE, null, ex);
    }
    return properties;
}

Does anybody know why?

Edit: a little smarter now So apparently more stuff is required to be sent to the server than just the url. Since this is dynamic ajax data being populated only if you ask it nice using the original web page, need to simulate that in java.

I discovered how to get that info in the chrome F12 debugger console. Under Network->XHR->Preview, click on each item until you see the expected data. Then right-click on it and select Copy->Copy Request Headers.

Here is what got copied:

GET /index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563231065712&bypassPage=1&test=1&_=1563231065712 HTTP/1.1 Host: charlotte.realforeclose.com Connection: keep-alive Accept: application/json, text/javascript, /; q=0.01 X-Requested-With: XMLHttpRequest User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36 Origin: http://evil.com/ Referer: https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/16/2019 Accept-Encoding: gzip, deflate, br Accept-Language: en-US,en;q=0.9 Cookie: cfid=6f228aa1-bb7e-4734-92ff-39eabf23ed9b; cftoken=0; CF_CLIENT_CHARLOTTE_REALFORECLOSE_TC=1563229207612; AWSELB=E7779D5F1C1F6ABE3513A5C5B6B0C754520B66675A407900314ABAC5333A52E93FD1A8D7401D89BC8D5E8B98059C8AAC5507D12A2C6ED07F7E7CB77311BD7FB09B738DB945; _ga=GA1.2.1823487290.1563231012; _gid=GA1.2.1418453663.1563231012; _gat=1; _gcl_au=1.1.273755450.1563231013; __utma=65865852.1823487290.1563231012.1563231014.1563231014.1; __utmc=65865852; __utmz=65865852.1563231014.1.1.utmcsr=realauction.com|utmccn=(referral)|utmcmd=referral|utmcct=/client-sites; __utmt_UA-51657054-1=1; __utmb=65865852.2.10.1563231014; testcookiesenabled=enabled; CF_CLIENT_CHARLOTTE_REALFORECLOSE_LV=1563231067363; CF_CLIENT_CHARLOTTE_REALFORECLOSE_HC=73

Now how do I get that into the request from java? I know how to do it in javascript but not java.


回答1:


Actually, I opened your URL in the browser and got

{"retHTML":"", "rlist":""}

Then I wrote my own code similar to yours and got the same String in response. So for me browser and Java code fetched the same info. But It is easily explainable how it doesn't have to be the case. Server can check and detect whether or not client that sends request is a browser and what kind and from which location request was sent. Based on those details server can send back customized response.




回答2:


Try running this – it will fetch that url and display the output:

curl "https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&AUCTIONDATE=07/16/2019"

So the behavior you're seeing isn't something Java is (or isn't) doing.

I suspect that the remote server is looking at the inbound HTTP request and deciding what to return. In your Java code, as with this simple curl example, there are no browser headers, user agent, etc. so the server is probably giving a generic answer because of that.

As another test, you could try changing your Java code to something else:

String urlStr = "http://duckduckgo.com";


来源:https://stackoverflow.com/questions/57046770/why-is-inputstreamreader-returning-different-content-than-browser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!