Detecting header in HTML tables using beautifulsoup / lxml when table lacks thead element
问题 I'd like to detect the header of an HTML table when that table does not have <thead> elements. (MediaWiki, which drives Wikipedia, does not support <thead> elements.) I'd like to do this with python in both BeautifulSoup and lxml. Let's say I already have a table object and I'd like to get out of it a thead object, a tbody object, and a tfoot object. Currently, parse_thead does the following when the <thead> tag is present: In BeautifulSoup, I get table objects with doc.find_all('table') and