问题
I have some xml which has multiple elements with the same name, but each is in a different language, for example:
<Title xml:lang="FR" type="main">Les Tudors</Title>
<Title xml:lang="DE" type="main">Die Tudors</Title>
<Title xml:lang="IT" type="main">The Tudors</Title>
Normally, I'd retrieve an element using its attributes as follows:
titlex = info.find('.//xmlns:Title[@someattribute=attributevalue]', namespaces=nsmap)
If I try and do this with [@xml:lang="FR"] (for example), I get the traceback error:
File "D:/Python code/RBM CRID, Title, Genre/CRID, Title, Genre, Age rating, Episode Number, Descriptions V1.py", line 29, in <module>
titlex = info.find('.//xmlns:Title[@xml:lang=PL]', namespaces=nsmap)
File "lxml.etree.pyx", line 1457, in lxml.etree._Element.find (src\lxml\lxml.etree.c:51435)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 282, in find
it = iterfind(elem, path, namespaces)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 272, in iterfind
selector = _build_path_iterator(path, namespaces)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 256, in _build_path_iterator
selector.append(ops[token[0]](_next, token))
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 134, in prepare_predicate
token = next()
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 80, in xpath_tokenizer
raise SyntaxError("prefix %r not found in prefix map" % prefix) SyntaxError: prefix 'xml' not found in prefix map
I'm not surprised by this, but I'd like suggestions on how to get around the issue.
Thanks!
As requested, a cut-down but complete set of code (It works as expected if I remove the [bitsinsquarebrackets]):
import lxml
import codecs
file_name = (input('Enter the file name, excluding .xml extension: ') + '.xml')# User inputs file name
print('Parsing ' + file_name)
#----- Sets up import and namespace
from lxml import etree
parser = lxml.etree.XMLParser()
tree = lxml.etree.parse(file_name, parser) # Name of file to test goes here
root = tree.getroot()
nsmap = {'xmlns': 'urn:tva:metadata:2012',
'mpeg7': 'urn:tva:mpeg7:2008'}
#----- This code writes the output to a file
with codecs.open(file_name+'.log', mode='w', encoding='utf-8') as f: # Name the output file
f.write(u'CRID|Title|Genre|Rating|Short Synopsis|Medium Synopsis|Long Synopsis\n')
for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap):
titlex = info.find('.//xmlns:Title[xml:lang="PL"]', namespaces=nsmap) # Retreve the title
title = titlex.text if titlex != None else 'Missing' # If there isn't a title, print an alternative word
f.write(u'{}\n'.format(title)) # Write all the retrieved values to the same line with bar seperators and a new line
回答1:
The xml
prefix in xml:lang
does not need to be declared in an XML document, but if you want to use xml:lang
in XPath lookups, you have to define a prefix mapping in the Python code.
The xml
prefix is reserved (as opposed to "normal" namespace prefixes which are arbitrary) and defined to be bound to http://www.w3.org/XML/1998/namespace
. See the Namespaces in XML 1.0 W3C recommendation.
Example:
from lxml import etree
# Required mapping
nsmap = {"xml": "http://www.w3.org/XML/1998/namespace"}
XML = """
<root>
<Title xml:lang="FR" type="main">Les Tudors</Title>
<Title xml:lang="DE" type="main">Die Tudors</Title>
<Title xml:lang="IT" type="main">The Tudors</Title>
</root>"""
doc = etree.fromstring(XML)
title_FR = doc.find('Title[@xml:lang="FR"]', namespaces=nsmap)
print title_FR.text
Output:
Les Tudors
If there is no mapping for the xml
prefix, you get the "prefix 'xml' not found in prefix map" error. If the URI mapped to the xml
prefix is not http://www.w3.org/XML/1998/namespace
, the find
method in the code snippet above does not return anything.
回答2:
If you have control over the xml
file , you should change the xml:lang
attribute to lang
.
Or if you do not have that control , I would suggest adding xml
in the nsmap, like -
nsmap = {'xmlns': 'urn:tva:metadata:2012',
'mpeg7': 'urn:tva:mpeg7:2008',
'xml': '<namespace>'}
来源:https://stackoverflow.com/questions/31250641/python-lxml-using-the-xmllang-attribute-to-retrieve-an-element