How to convert an HTML table to an array in python

前端未结

关注

 3  1875

故里飘歌 2020-12-24 03:23

I have an html document, and I want to pull the tables out of this document and return them as arrays. I\'m picturing 2 functions, one that finds all the html tables in a d

3条回答

孤城傲影 (楼主)

2020-12-24 04:01
Use BeautifulSoup (I recommend 3.0.8). Finding all tables is trivial:
```
import BeautifulSoup

def get_tables(htmldoc):
    soup = BeautifulSoup.BeautifulSoup(htmldoc)
    return soup.findAll('table')
```
However, in Python, an array is 1-dimensional and constrained to pretty elementary types as items (integers, floats, that elementary). So there's no way to squeeze an HTML table in a Python array.

Maybe you mean a Python list instead? That's also 1-dimensional, but anything can be an item, so you could have a list of lists (one sublist per tr tag, I imagine, containing one item per td tag).

That would give:
```
def makelist(table):
  result = []
  allrows = table.findAll('tr')
  for row in allrows:
    result.append([])
    allcols = row.findAll('td')
    for col in allcols:
      thestrings = [unicode(s) for s in col.findAll(text=True)]
      thetext = ''.join(thestrings)
      result[-1].append(thetext)
  return result
```
This may not yet be quite what you want (doesn't skip HTML comments, the items of the sublists are unicode strings and not byte strings, etc) but it should be easy to adjust.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...