I have an XML structure that looks like the following, but on a much larger scale:
your authortext
is of type 1 (ELEMENT_NODE
), normally you need to have TEXT_NODE
to get a string. This will work
a.childNodes[0].nodeValue
I played around with it a bit, and here's what I got to work:
# ...
authortext= a.childNodes[0].nodeValue
print authortext
leading to output of:
C:\temp\py>xml2.py 1 Bob Nigel 2 Alice Mary
I can't tell you exactly why you have to access the childNode to get the inner text, but at least that's what you were looking for.
Element nodes don't have a nodeValue. You have to look at the Text nodes inside them. If you know there's always one text node inside you can say element.firstChild.data
(data is the same as nodeValue for text nodes).
Be careful: if there is no text content there will be no child Text nodes and element.firstChild
will be null, causing the .data
access to fail.
Quick way to get the content of direct child text nodes:
text= ''.join(child.data for child in element.childNodes if child.nodeType==child.TEXT_NODE)
In DOM Level 3 Core you get the textContent
property you can use to get text from inside an Element recursively, but minidom doesn't support this (some other Python DOM implementations do).
Quick access:
node.getElementsByTagName('author')[0].childNodes[0].nodeValue
Since you always have one text data value per author you can use element.firstChild.data
dom = parseString(document)
conferences = dom.getElementsByTagName("conference")
# Each conference here is a node
for conference in conferences:
conference_name = conference.getAttribute("name")
print
print conference_name.upper() + " - "
authors = conference.getElementsByTagName("author")
for author in authors:
print " ", author.firstChild.data
# for
print