JSoup.connect throws 403 error while apache.httpclient is able to fetch the content

女生的网名这么多〃 提交于 2019-12-30 07:54:06

问题


I am trying to parse HTML dump of any given page. I used HTML Parser and also tried JSoup for parsing.

I found useful functions in Jsoup but I am getting 403 error while calling Document doc = Jsoup.connect(url).get();

I tried HTTPClient, to get the html dump and it was successful for the same url.

Why is JSoup giving 403 for the same URL which is giving content from commons http client? Am I doing something wrong? Any thoughts?


回答1:


Working solution is as follows (Thanks to Angelo Neuschitzer for reminding to put it as a solution):

Document doc = Jsoup.connect(url).userAgent("Mozilla").get();
Elements links = doc.getElementsByTag(HTML.Tag.CITE.toString);
for (Element link : links) {
            String linkText = link.text();
            System.out.println(linkText);
}

So, userAgent does the trick :)



来源:https://stackoverflow.com/questions/10120849/jsoup-connect-throws-403-error-while-apache-httpclient-is-able-to-fetch-the-cont

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!