The last couple of days I have been trying to open and read a certain XML file (in DATEXII format), but have not succeeded so far. It is about traffic data from the NDW Open
Consider transforming your nested XML input source into a flatter structure using XSLT the special-purpose transformation language designed to transform XML files into other XML, HTML, even text (CSV/TAB). Therefore, consider the below XSLT that transforms original XML into comma-separated values in tabular format for import into pandas with read_csv()
:
XSLT (save as .xsl file, a special xml file)
publicationTime,country,nationalIdentifier,msmtSiteTableRef_targetClass,msmtSiteTableRef_version,msmtSiteTableRef_id,
msmtSiteRef_targetClass,msmtSiteRef_version,msmtSiteRef_id,measurementTimeDefault,
measuredValue_index,basicData_type,vehicleFlowRate,averageVehicleSpeed_numberOfInputValues,averageVehicleSpeed_value
Python
from io import StringIO
import lxml.etree as et
import pandas as pd
# LOAD XML AND XSL FILES
doc = et.parse('/path/to/Input.xml')
xsl = et.parse('/path/to/XSLT.xsl')
# INITIALIZE AND RUN TRANSFORMATION
transform = et.XSLT(xsl)
# CONVERT RESULT TO STRING
result = str(transform(doc))
# IMPORT INTO DATAFRAME
df = pd.read_csv(StringIO(result))
Output (parent node values become repeated indicators with different numeric data)
print(df)
# publicationTime country nationalIdentifier msmtSiteTableRef_targetClass msmtSiteTableRef_version msmtSiteTableRef_id msmtSiteRef_targetClass msmtSiteRef_version msmtSiteRef_id measurementTimeDefault measuredValue_index basicData_type vehicleFlowRate averageVehicleSpeed_numberOfInputValues averageVehicleSpeed_value
# 0 20171030T05:00:40.007Z nl NLNDW MeasurementSiteTable 955 NDW01_MT MeasurementSiteRecord 1 PZH01_MST_0690_00 20171030T04:59:00Z 1 TrafficFlow 60.0 NaN NaN
# 1 20171030T05:00:40.007Z nl NLNDW MeasurementSiteTable 955 NDW01_MT MeasurementSiteRecord 1 PZH01_MST_0690_00 20171030T04:59:00Z 2 TrafficFlow 0.0 NaN NaN
# 2 20171030T05:00:40.007Z nl NLNDW MeasurementSiteTable 955 NDW01_MT MeasurementSiteRecord 1 PZH01_MST_0690_00 20171030T04:59:00Z 3 TrafficFlow 0.0 NaN NaN
# 3 20171030T05:00:40.007Z nl NLNDW MeasurementSiteTable 955 NDW01_MT MeasurementSiteRecord 1 PZH01_MST_0690_00 20171030T04:59:00Z 4 TrafficFlow 60.0 NaN NaN
# 4 20171030T05:00:40.007Z nl NLNDW MeasurementSiteTable 955 NDW01_MT MeasurementSiteRecord 1 PZH01_MST_0690_00 20171030T04:59:00Z 5 TrafficSpeed NaN 1.0 38.0
# 5 20171030T05:00:40.007Z nl NLNDW MeasurementSiteTable 955 NDW01_MT MeasurementSiteRecord 1 PZH01_MST_0690_00 20171030T04:59:00Z 6 TrafficSpeed NaN 0.0 1.0