问题
This is a follow-up to the previous question: Write xml with a path and value. I want to now add in two additional things: 1) Attributes and 2) Multiple items with a parent node. Here is the list of paths I have:
[
{'Path': 'Item/Info/Name', 'Value': 'Body HD'},
{'Path': 'Item/Info/Synopsis', 'Value': 'A great movie'},
{'Path': 'Item/Locales/Locale[@Country="US"][@Language="ES"]/Name', 'Value': 'El Grecco'},
{'Path': 'Item/Genres/Genre', 'Value': 'Action'},
{'Path': 'Item/Genres/Genre', 'Value': 'Drama'},
{'Path': 'Item/Purchases/Purchase[@Country="US"]/HDPrice', 'Value': '10.99'},
{'Path': 'Item/Purchases/Purchase[@Country="US"]/SDPrice', 'Value': '9.99'},
{'Path': 'Item/Purchases/Purchase[@Country="CA"]/SDPrice', 'Value': '4.99'},
]
The xml it should generate is:
<Item>
<Info>
<Name>Body HD</Name>
<Synopsis>A great movie</Synopsis>
</Info>
<Locales>
<Locale Country="US" Language="ES">
<Name>El Grecco</Name>
</Locale>
</Locales>
<Genres>
<Genre>Action</Genre>
<Genre>Drama</Genre>
</Genres>
<Purchases>
<Purchase Country="US">
<HDPrice>10.99</HDPrice>
<SDPrice>9.99</SDPrice>
</Purchase>
<Purchase Country="CA">
<SDPrice>4.99</SDPrice>
</Purchase>
</Purchases>
</Item>
How would I build this out?
回答1:
To build a XML tree from xpaths and values, I use RegEx and lxml
:
import re
from lxml import etree
The entries are:
entries = [
{'Path': 'Item/Info/Name', 'Value': 'Body HD'},
{'Path': 'Item/Info/Synopsis', 'Value': 'A great movie'},
{'Path': 'Item/Locales/Locale[@Country="US"][@Language="ES"]/Name', 'Value': 'El Grecco'},
{'Path': 'Item/Genres/Genre', 'Value': 'Action'},
{'Path': 'Item/Genres/Genre', 'Value': 'Drama'},
{'Path': 'Item/Purchases/Purchase[@Country="US"]/HDPrice', 'Value': '10.99'},
{'Path': 'Item/Purchases/Purchase[@Country="US"]/SDPrice', 'Value': '9.99'},
{'Path': 'Item/Purchases/Purchase[@Country="CA"]/SDPrice', 'Value': '4.99'},
]
To parse each xpath step, I use the following RegEx (very simple one):
TAG_REGEX = r"(?P<tag>\w+)"
CONDITION_REGEX = r"(?P<condition>(?:\[.*?\])*)"
STEP_REGEX = TAG_REGEX + CONDITION_REGEX
ATTR_REGEX = r"@(?P<key>\w+)=\"(?P<value>.*?)\""
search_step = re.compile(STEP_REGEX, flags=re.DOTALL).search
findall_attr = re.compile(ATTR_REGEX, flags=re.DOTALL).findall
def parse_step(step):
mo = search_step(step)
if mo:
tag = mo.group("tag")
condition = mo.group("condition")
return tag, dict(findall_attr(condition))
raise ValueError(xpath)
The parse_step
return a tag name and a attributes dictionary.
Then, I process the same way to build the XML tree:
root = None
for entry in entries:
path = entry["Path"]
parts = path.split("/")
xpath_list = ["/" + parts[0]] + parts[1:]
curr = root
for xpath in xpath_list:
tag_name, attrs = parse_step(xpath)
if curr is None:
root = curr = etree.Element(tag_name, **attrs)
else:
nodes = curr.xpath(xpath)
if nodes:
curr = nodes[0]
else:
curr = etree.SubElement(curr, tag_name, **attrs)
if curr.text:
curr = etree.SubElement(curr.getparent(), curr.tag, **curr.attrib)
curr.text = entry["Value"]
print(etree.tostring(root, pretty_print=True))
The result is:
<Item>
<Info>
<Name>Body HD</Name>
<Synopsis>A great movie</Synopsis>
</Info>
<Locales>
<Locale Country="US" Language="ES">
<Name>El Grecco</Name>
</Locale>
</Locales>
<Genres>
<Genre>Action</Genre>
<Genre>Drama</Genre>
</Genres>
<Purchases>
<Purchase Country="US">
<HDPrice>10.99</HDPrice>
<SDPrice>9.99</SDPrice>
</Purchase>
<Purchase Country="CA">
<SDPrice>4.99</SDPrice>
</Purchase>
</Purchases>
</Item>
来源:https://stackoverflow.com/questions/38984272/write-xml-from-list-of-path-values