What should I do when has rowspan

前端 未结 4 1282
忘掉有多难
忘掉有多难 2020-12-18 06:17

If the row has rowspan element , how to make the row correspond to the table as in wikipedia page.

from bs4 import BeautifulSoup
import urllib2
from lxm         


        
4条回答
  •  温柔的废话
    2020-12-18 06:57

    pandas >= 0.24.0 understands colspan and rowspan attributes, as documented in the release notes. To extract the wikipage table that were giving you issues previously, the following works.

    import pandas as pd
    
    
    # Extract all tables from the wikipage
    dfs = pd.read_html("http://en.wikipedia.org/wiki/List_of_England_Test_cricket_records")
    # The table referenced above is the 7th on the wikipage
    df = dfs[6]
    # The last row is just the date of the last update
    df = df.iloc[:-1]
    

    Out:

       Rank  Victories    Opposition                                 Most recent venue              Date
    0     1          6  South Africa                           Lord's, London, England      21 June 1951
    1    =2          4         India                   Wankhede Stadium, Mumbai, India  23 November 2012
    2    =2          4   West Indies                           Lord's, London, England        6 May 2009
    3     4          3     Australia          Sydney Cricket Ground, Sydney, Australia   2 December 1932
    4     5          2      Pakistan                 Trent Bridge, Nottingham, England    10 August 1967
    5     6          1     Sri Lanka  Old Trafford Cricket Ground, Manchester, England      13 June 2002
    

提交回复
热议问题