Beautiful soup missing some html table tags

老子叫甜甜 提交于 2019-12-23 05:17:45


I'm trying to extract data from a website using beautiful soup to parse the html. I'm currently trying to get the table data from the following webpage :

link to webpage

I want to get the data from the table. First I save the page as an html file on my computer (this part works fine, I checked that I got all the information) but when I try to parse with the following code :

soup = BeautifulSoup(fh, 'html.parser')
table = soup.find_all('table') 
cols = table[0].find_all('tr')
cells = cols[1].find_all('td')`

I don't get any results (specifically it crashes, saying there's no element at index 1). Any idea of where it could come from?



Ok actually it was an issue in the html file, in the first line the html tags were opened with th but closed with td. I don't know much about HTML but replacing the th by td solved the issue.

<tr class="listeEtablenTete">
<th title="Rubrique IC">Rubri. IC</td>
<th title="Alin&eacute;a">Ali.&nbsp;</td>
<th title="Date d'autorisation">Date auto.</td>
<th >Etat d'activit&eacute;</td>
<th title="R&eacute;gime">R&eacute;g.</td>
<th >Activit&eacute;</td>
<th >Volume</td>
<th >Unit&eacute;</td>`

Thanks !

