HTML table to pandas table: Info inside html tags

后端 未结 3 1231
轮回少年
轮回少年 2021-01-04 21:02

I have a large table from the web, accessed via requests and parsed with BeautifulSoup. Part of it looks something like this:


&l         
3条回答
  •  一向
    一向 (楼主)
    2021-01-04 21:49

    You could use regular expressions to modify the text first and remove the html tags:

    import re, pandas as pd
    tbl = """
265 JonesBlue 29
266 Smith 34
""" tbl = re.sub('(.*?)', '\\1 \\2', tbl) pd.read_html(tbl)

which gives you

[     0                           1   2
 0  265  /j/jones03.shtml JonesBlue  29
 1  266      /s/smith01.shtml Smith  34]

提交回复
热议问题