How do I use xml namespaces with find/findall in lxml?

前端未结

关注

 4  984

星月不相逢 2020-12-05 02:50

I\'m trying to parse content in an OpenOffice ODS spreadsheet. The ods format is essentially just a zipfile with a number of documents. The content of the spreadsheet is sto

4条回答

不思量自难忘° (楼主)

2020-12-05 03:10
Maybe the first thing to notice is that the namespaces are defined at Element level, not Document level.

Most often though, all namespaces are declared in the document's root element (office:document-content here), which saves us parsing it all to collect inner xmlns scopes.

Then an element nsmap includes :
- a default namespace, with None prefix (not always)
- all ancestors namespaces, unless overridden.
If, as ChrisR mentionned, the default namespace is not supported, you can use a dict comprehension to filter it out in a more compact expression.

You have a slightly different syntax for xpath and ElementPath.

So here's the code you could use to get all your first table's rows (tested with: lxml=3.4.2) :
```
import zipfile
from lxml import etree

# Open and parse the document
zf = zipfile.ZipFile('spreadsheet.ods')
tree = etree.parse(zf.open('content.xml'))

# Get the root element
root = tree.getroot()

# get its namespace map, excluding default namespace
nsmap = {k:v for k,v in root.nsmap.iteritems() if k}

# use defined prefixes to access elements
table = tree.find('.//table:table', nsmap)
rows = table.findall('table:table-row', nsmap)

# or, if xpath is needed:
table = tree.xpath('//table:table', namespaces=nsmap)[0]
rows = table.xpath('table:table-row', namespaces=nsmap)
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...