问题
Does not follow (or at least does not get entire page content), how to solve that ?
There is no client side redirects I presume ...
<meta http-equiv ...
stackoverflow http-equiv
inside what I get down from this:
Document doc1 = Jsoup.connect("http://e-uprava.gov.si/e-uprava/oglasnadeska.htm")
.header("Accept-Encoding", "gzip, deflate")
.userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
.ignoreContentType(true)
.ignoreHttpErrors(true)
.followRedirects(true)
.timeout(600000)
.maxBodySize(0)/*unlimited body size*/
.get();
.
String url = "http://e-uprava.gov.si/e-uprava/oglasnadeska.htm";
final Connection connection = Jsoup.connect(url).timeout(10000);
final Response response = connection.execute();
final int status = response.statusCode();
System.out.println(status);
status = 200
That is
div class="subpage-container ...
is not filled with stuff that I see in browser. Checking for meta and javascript redirects --> no usable results
回答1:
explanation:
Redirect is not the problem and jsoup loads the page correctly.
The problem is that the page is using JavaScript to dynamically load the content that you're looking for. While jsoup is just HTML parser, you cannot expect from it executing JavaScript and fetching the data.
solution:
If you open this page in browser and look at developer tools for all request that this page makes, you'll certainly find this one:
http://e-uprava.gov.si/si/e-uprava/oglasnadeska/content/singleton.html?&type=-&rijs=-1&offset=155&sentinel_type=ok&sentinel_status=ok&is_ajax=1
Which contains all the data you want.
This solution is not ideal and any changes to page can break it. It would be much better to use browser emulators such as Selenium or HtmlUnit
来源:https://stackoverflow.com/questions/35600919/jsoup-followredirectstrue-does-not-work