If the row has rowspan element , how to make the row correspond to the table as in wikipedia page.
from bs4 import BeautifulSoup
import urllib2
from lxm
pandas >= 0.24.0 understands colspan
and rowspan
attributes, as documented in the
release
notes. To extract the wikipage table that were giving you issues previously, the following works.
import pandas as pd
# Extract all tables from the wikipage
dfs = pd.read_html("http://en.wikipedia.org/wiki/List_of_England_Test_cricket_records")
# The table referenced above is the 7th on the wikipage
df = dfs[6]
# The last row is just the date of the last update
df = df.iloc[:-1]
Out:
Rank Victories Opposition Most recent venue Date
0 1 6 South Africa Lord's, London, England 21 June 1951
1 =2 4 India Wankhede Stadium, Mumbai, India 23 November 2012
2 =2 4 West Indies Lord's, London, England 6 May 2009
3 4 3 Australia Sydney Cricket Ground, Sydney, Australia 2 December 1932
4 5 2 Pakistan Trent Bridge, Nottingham, England 10 August 1967
5 6 1 Sri Lanka Old Trafford Cricket Ground, Manchester, England 13 June 2002