I\'m trying to do a find all from a Word document for
with namespace xmlns:v=\"urn:schemas-microsoft-
With ElementTree in Python 3.8, you can simply use a wildcard ({*}
) for the namespace:
results = ET.fromstring(xml).findall(".//{*}imagedata")
Note the .//
part, which means that the whole document (all descendants) is searched.
I'm going to leave the question open, but the workaround I'm currently using is to use BeautifulSoup which happily accepts the v:
syntax.
soup = BeautifulSoup(xml, "lxml")
results = soup.find_all("v:imagedata")
ET.findall()
vs BS4.find_all()
:
match
argument (tag or path) with ".//"
will search for that node anywhere in the tree, since it's supports XPath's.However, ElementTree.iter() does search all descendants. Using the 'working with namespaces' example in the docs:
>>> for char in root.iter('{http://characters.example.com}character'):
... print(' |-->', char.text)
...
|--> Lancelot
|--> Archie Leach
|--> Sir Robin
|--> Gunther
|--> Commander Clement
''
in the tags are treated wrt the namespace, and one returns a list while the other returns an iterator, I can't say there's a meaningful difference between ET.findall
and ET.iterfind
.
ET.findall()
, prefixing ".//"
makes it search the entire tree (matches with any node).When you use the namespaces with ET, you still need the namespace name with the tag. The results line should be:
namespace = {'v': "urn:schemas-microsoft-com:vml"}
results = ET.fromstring(xml).findall("v:imagedata", namespace) # note the 'v:'
Also, the 'v'
doesn't need to be a 'v'
, you could change it to something more meaningful if needed:
namespace = {'image': "urn:schemas-microsoft-com:vml"}
results = ET.fromstring(xml).findall("image:imagedata", namespace)
Of course, this still won't necessarily get you all the imagedata elements if they aren't direct children of the root. For that, you'd need to create a recursive function to do it for you. See this answer on SO for how. Note, while that answer does a recursive search, you are likely to hit Python's recursion limit if the descendant depth is too...deep.
To get all the imagedata elements anywhere in the tree, use the ".//"
prefix:
results = ET.fromstring(xml).findall(".//v:imagedata", namespace)