parsing/extracting a HTML Table, Website in Java

自闭症网瘾萝莉.ら 提交于 2020-01-10 19:56:46

问题


I want to parse the contents of this HTML table :

Here is the full website with source code:

http://www.kantschule-falkensee.de/uploads/dmiadgspahw/klassen/A_Klasse_11.htm

I want to parse the data for each cell, all 5 cells under "Montag"(Monday) as an example. I tried several ways of parsing this Website using JSOUP but i havent got any succes with it. My main Goal is to show the contents in an listview in an Android app. For now i tried to print the contents in a java console. Both Languages are accepted :). Any Help is appreciated.


回答1:


Here are the steps you would need to follow:

1) You could use any of the below java libraries for HTML scraping:

  • Tag Soup
  • HtmlUnit
  • Web-Harvest
  • jARVEST
  • jsoup
  • Jericho HTML Parser

2) Use Xpath helper

Eg 1: Enter "//tr[1]//td[1]" in the query and it will give all table elements at position (1,1)

Eg 2: "/html/body[@class='tt']/center/table[1]/tbody/tr[4]/td[3]/table/tbody/tr/td" Will give you all 15 values under Montag.

Eg 3: "/html/body[@class='tt']/center/table[1]/tbody/tr/td/table/tbody/tr/td" Will give you all 380 entries of the table

OR

Example using Jsoup

import org.jsoup.Jsoup;
import java.io.IOException;

public class Main {
    public static void main(String[] args) throws IOException {
        org.jsoup.nodes.Document doc = Jsoup.connect("http://www.kantschule-falkensee.de/uploads/dmiadgspahw/klassen/A_Klasse_11.htm").get();
        org.jsoup.select.Elements rows = doc.select("tr");
        for(org.jsoup.nodes.Element row :rows)
        {
            org.jsoup.select.Elements columns = row.select("td");
            for (org.jsoup.nodes.Element column:columns)
            {
                System.out.print(column.text());
            }
            System.out.println();
        }

    }
}


来源:https://stackoverflow.com/questions/31360275/parsing-extracting-a-html-table-website-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!