jsoup

使用Jsoup解析html网页

心已入冬 提交于 2019-12-26 23:57:39
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 一、 JSOUP简介 在以往用java来处理解析HTML文档或者片段时,我们通常会采用htmlparser( http://htmlparser.sourceforge.net/ )这个开源类库。现在我们有了JSOUP,以后的处理HTML的内容只需要使用JSOUP就已经足够了,JSOUP有更快的更新,更方便的API等。 jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据,可以看作是java版的jQuery。 jsoup的主要功能如下: 从一个URL,文件或字符串中解析HTML; 使用DOM或CSS选择器来查找、取出数据; 可操作HTML元素、属性、文本; jsoup是基于MIT协议发布的,可放心使用于商业项目。官方网站: http://jsoup.org/ 二、 解析遍历HTML文档 Jsoup处理HTML文件是,是将用户输入的HTML文档,解析转换成一个Document对象进行处理。Jsoup一般支持以下几种来源内容的转换。 解析一个html字符串 解析一个body片段 根据一个url地址加载Document对象 根据一个文件加载Document对象 (一

Android - Jsoup: How to get RESULT from Jsoup.connect(“url”).get() from AsyncTask

青春壹個敷衍的年華 提交于 2019-12-25 18:42:52
问题 I want to get DOCUMENT which return Jsoup.connect("url").get() from AsyncTask, can i to return it? I just want to create class which will be to help me to get data from some url )) Activity class: @Override protected void onCreate(Bundle savedInstanceState){ super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); lv = (ListView) findViewById(R.id.listView1); try { doc = new GetDataFromUrl(this).execute(functions_list).get(); } catch (InterruptedException e) { // TODO Auto

Jsoup Parsing: Parsing dynamic values as Key value pair

风格不统一 提交于 2019-12-25 17:26:31
问题 I am extracting this data from url (http://www.gmdu.net/corp-902113.html).. Can any one help me to extract data with below format.. Guide me pls <div class="content"> <div class="label">Company Name: </div> Haycolour (Pvt) Ltd <br/> <div class="label">Business Owner: </div> Hayleys (Group of Companies <br/> <div class="label">Employees: </div> 50 <br/> <div class="label">Main markets: </div> Asia and south Asia <br/> <div class="label">Business Type: </div> Manufacturer and Importers <br/>

How to solve java.security.AccessControlException

﹥>﹥吖頭↗ 提交于 2019-12-25 17:15:16
问题 I need suggestions concerning the java.security.AccessControlException , I get when executing the following code. (I have consulted similar questions here but didn't success to make it work) Here is my server code: public class GetPageInfos extends UnicastRemoteObject implements RemoteGetInfo{ private static final String url="http://www.lemonde.fr/"; public class GetPageInfos extends UnicastRemoteObject implements RemoteGetInfo{ private static final String url="http://www.lemonde.fr/"; public

JSoup Login and Cookie

独自空忆成欢 提交于 2019-12-25 16:39:48
问题 I'm trying to login into a site using JSoup but I'm having trouble getting a good cookie back. I'm not sure if the URL or login data is incorrect. Any help would be much appreciated. The login page is here I'm currently trying with the following code: Connection.Response res = Jsoup.connect("https://go.sfu.ca/psp/goprd/?cmd=login&languageCd=ENG") .data("user", "myUserID", "pwd", "myPassword") .method(Connection.Method.POST) .execute(); I do not get the same amount of cookie information if I

Jsoup 1.8.3 missing element

坚强是说给别人听的谎言 提交于 2019-12-25 13:16:03
问题 I have been using Jsoup 1.7.2 for about a year and thought of updating to version 1.8.3. I have stumbled across an issue, please check it out: Here is the document: <dl class="a"> <dt> Total transfers </dt> <dd> 2 </dd> <dt> Gameweek transfers </dt> <dd> 2 </dd> </dl> <dl class="b"> <dt> Team value </dt> <dd> £99.2m </dd> <dt> In the bank </dt> <dd> £0.3m </dd> </dl> Now when I try to select "dd" like doc.select("dd") , it returns an element with size 3 : <dd> 2 </dd> <dd> £99.2m </dd> <dd>

Jsoup 1.8.3 missing element

混江龙づ霸主 提交于 2019-12-25 13:15:25
问题 I have been using Jsoup 1.7.2 for about a year and thought of updating to version 1.8.3. I have stumbled across an issue, please check it out: Here is the document: <dl class="a"> <dt> Total transfers </dt> <dd> 2 </dd> <dt> Gameweek transfers </dt> <dd> 2 </dd> </dl> <dl class="b"> <dt> Team value </dt> <dd> £99.2m </dd> <dt> In the bank </dt> <dd> £0.3m </dd> </dl> Now when I try to select "dd" like doc.select("dd") , it returns an element with size 3 : <dd> 2 </dd> <dd> £99.2m </dd> <dd>

Jsoup links extraction

一世执手 提交于 2019-12-25 12:41:52
问题 hello guys I am trying to extract all the anchor links from aol but it is not working. The same code is working with yahoo bing. The question is what would be the problem Document document5 = Jsoup.connect("www.aol.com").get(); Elements links5 = document5.select("a"); for (Element link5 : links5) { out.println(link5.attr("href")); } 回答1: As per the comments on your previous question: even after im specifying the protocol...only google and aol are not working, same is working with yahoo, bing

Jsoup links extraction

江枫思渺然 提交于 2019-12-25 12:41:31
问题 hello guys I am trying to extract all the anchor links from aol but it is not working. The same code is working with yahoo bing. The question is what would be the problem Document document5 = Jsoup.connect("www.aol.com").get(); Elements links5 = document5.select("a"); for (Element link5 : links5) { out.println(link5.attr("href")); } 回答1: As per the comments on your previous question: even after im specifying the protocol...only google and aol are not working, same is working with yahoo, bing

Jsoup and gzipped html content (Android)

我是研究僧i 提交于 2019-12-25 12:25:11
问题 I've been trying all day to make this thing works but it's still not right yet. I've checked so many posts around here and tested so many different implementations that I'dont know where to look now... Here is my situation, I have a small php test file (gz.php) on my server wich looks like this : header("Content-Encoding: gzip"); print("\x1f\x8b\x08\x00\x00\x00\x00\x00"); $contents = gzcompress("Is it working?", 9); print($contents); This is the simplest I could do and it works fine with any