Parsing a table with rowspan and colspan

后端 未结 2 1383
轮回少年
轮回少年 2020-12-14 04:07

I have a table that I need to parse, specifically it is a school schedule with 4 blocks of time, and 5 blocks of days for every week. I\'ve attempted to parse it, but honest

2条回答
  •  旧时难觅i
    2020-12-14 04:21

    Update: There is a bug in this answer (which is based on reclosedev solution)

    See How to parse table with rowspan and colspan

    Old:

    For those who want a Python 3 and BeautifulSoup solution,

    def table_to_2d(table_tag):
        rows = table_tag("tr")
        cols = rows[0](["td", "th"])
        table = [[None] * len(cols) for _ in range(len(rows))]
        for row_i, row in enumerate(rows):
            for col_i, col in enumerate(row(["td", "th"])):
                insert(table, row_i, col_i, col)
        return table
    
    
    def insert(table, row, col, element):
        if row >= len(table) or col >= len(table[row]):
            return
        if table[row][col] is None:
            value = element.get_text()
            table[row][col] = value
            if element.has_attr("colspan"):
                span = int(element["colspan"])
                for i in range(1, span):
                    table[row][col+i] = value
            if element.has_attr("rowspan"):
                span = int(element["rowspan"])
                for i in range(1, span):
                    table[row+i][col] = value
        else:
            insert(table, row, col + 1, element)
    

    Usage:

    soup = BeautifulSoup('
    125
    34
    67
    ', 'html.parser') print(table_to_2d(soup.table))

    This is NOT optimized. I wrote this for my one-time script.

提交回复
热议问题