I\'m trying to do a find all from a Word document for with namespace xmlns:v=\"urn:schemas-microsoft-
ET.findall() vs BS4.find_all():
match argument (tag or path) with ".//" will search for that node anywhere in the tree, since it's supports XPath's.However, ElementTree.iter() does search all descendants. Using the 'working with namespaces' example in the docs:
>>> for char in root.iter('{http://characters.example.com}character'):
... print(' |-->', char.text)
...
|--> Lancelot
|--> Archie Leach
|--> Sir Robin
|--> Gunther
|--> Commander Clement
'' in the tags are treated wrt the namespace, and one returns a list while the other returns an iterator, I can't say there's a meaningful difference between ET.findall and ET.iterfind.
ET.findall(), prefixing ".//" makes it search the entire tree (matches with any node).When you use the namespaces with ET, you still need the namespace name with the tag. The results line should be:
namespace = {'v': "urn:schemas-microsoft-com:vml"}
results = ET.fromstring(xml).findall("v:imagedata", namespace) # note the 'v:'
Also, the 'v' doesn't need to be a 'v', you could change it to something more meaningful if needed:
namespace = {'image': "urn:schemas-microsoft-com:vml"}
results = ET.fromstring(xml).findall("image:imagedata", namespace)
Of course, this still won't necessarily get you all the imagedata elements if they aren't direct children of the root. For that, you'd need to create a recursive function to do it for you. See this answer on SO for how. Note, while that answer does a recursive search, you are likely to hit Python's recursion limit if the descendant depth is too...deep.
To get all the imagedata elements anywhere in the tree, use the ".//" prefix:
results = ET.fromstring(xml).findall(".//v:imagedata", namespace)