Getting data from broken xml in Python

陌路散爱 提交于 2019-12-13 02:55:38

问题


I would like to get data from xml, but it structure seems to be broken.

I have this example URL: https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/73478 Which is xml with data about the product.

import requests
import json
from xml.etree import ElementTree
from pprint import pprint

response = requests.get(
    "https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/86559",
    headers={"Accept": "application/xml"},
)

node = ElementTree.fromstring(response.content)

data = json.loads(node.text)

this returns dict with four keys:

{'jsonChildsConfig': '{"70259":{"id":"70259","name":"Ski Ultra Merino E - '
                     'black\\/orange","sku":"610306139887","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\\/\\/b2b.snapoutdoor.pl\\/checkout\\/cart\\/add\\/uenc\\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\\/product\\/86559\\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"36-39 '
                     '","salable":true},"70260":{"id":"70260","name":"Ski '
                     'Ultra Merino E - '
                     'black\\/orange","sku":"610306139894","availableQty":7,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\\/\\/b2b.snapoutdoor.pl\\/checkout\\/cart\\/add\\/uenc\\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\\/product\\/86559\\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"40-43 '
                     '","salable":true},"70261":{"id":"70261","name":"Ski '
                     'Ultra Merino E - '
                     'black\\/orange","sku":"610306139900","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\\/\\/b2b.snapoutdoor.pl\\/checkout\\/cart\\/add\\/uenc\\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\\/product\\/86559\\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"44-47 '
                     '","salable":true},"99060":{"id":"99060","name":"Ski '
                     'Ultra Merino E - '
                     'black\\/orange","sku":"610306139917","availableQty":3,"regularPrice":69.24,"finalPrice":69.24,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\\/\\/b2b.snapoutdoor.pl\\/checkout\\/cart\\/add\\/uenc\\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\\/product\\/86559\\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"48+ '
                     '","salable":true}}',
 'jsonConfig': 'some data',
 'jsonDefaultPlaceholder': 'https://b2b.snapoutdoor.pl/pub/media/catalog/product/placeholder/',
 'jsonSwatchConfig': 'some data'
}

I'm interested with values of jsonChildsConfig, but when trying to reach keys inside it, I got TypeError: string indices must be integers because the value for jsonChildsConfig is a string.

I would like to get all sku and stock values from sku and availableQty but theirs type is string and it is not possible to get it through

data['jsonChildsConfig']['70259']['sku']

or

data['jsonChildsConfig']['70259']['availableQty'].

I also tried to convert this string to json byt json.loads() but it didn't work.

Could you please help me with it? 🙏🙂


回答1:


To fix your dictionary you need to apply json.loads to all the values ​​of your dictionary, excluding 'jsonDefaultPlaceholder' which is not in json format:

del data['jsonDefaultPlaceholder']
new_data = {k: json.loads(v) for k, v in data.items() if v}
new_data['jsonChildsConfig']['70259']['sku']

#output: '610306139887'

or if you want to convert the keys that interest you into integer values:

del data['jsonDefaultPlaceholder']
new_data2 = {k: {(int(key) if key.isdigit() else key): val for key,val in json.loads(v).items()} for k, v in data.items() if v}
new_data2['jsonChildsConfig'][70259]['sku']

# output: '610306139887'



回答2:


Converting the value of data['jsonChildsConfig'] to dict using json.loads should work

>>> childConfigDetails = json.loads(data['jsonChildsConfig'])
>>> childConfigDetails['70259']['sku']
'610306139887'


来源:https://stackoverflow.com/questions/58183799/getting-data-from-broken-xml-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!