Getting the nth element using BeautifulSoup

后端 未结 5 1899
生来不讨喜
生来不讨喜 2021-01-31 04:09

From a large table I want to read rows 5, 10, 15, 20 ... using BeautifulSoup. How do I do this? Is findNextSibling and an incrementing counter the way to go?

5条回答
  •  Happy的楠姐
    2021-01-31 04:35

    As a general solution, you can convert the table to a nested list and iterate...

    import BeautifulSoup
    
    def listify(table):
      """Convert an html table to a nested list""" 
      result = []
      rows = table.findAll('tr')
      for row in rows:
        result.append([])
        cols = row.findAll('td')
        for col in cols:
          strings = [_string.encode('utf8') for _string in col.findAll(text=True)]
          text = ''.join(strings)
          result[-1].append(text)
      return result
    
    if __name__=="__main__":
        """Build a small table with one column and ten rows, then parse into a list"""
        htstring = """
    foo1
    foo2
    foo3
    foo4
    foo5
    foo6
    foo7
    foo8
    foo9
    foo10
    """ soup = BeautifulSoup.BeautifulSoup(htstring) for idx, ii in enumerate(listify(soup)): if ((idx+1)%5>0): continue print ii

    Running that...

    [mpenning@Bucksnort ~]$ python testme.py
    ['foo5']
    ['foo10']
    [mpenning@Bucksnort ~]$
    

提交回复
热议问题