Extracting tag attributes from xml using sparkxml

不羁岁月 提交于 2019-12-02 08:27:57

First print schema of the dataframe, in my case it is printed like below with spark xml version 0.3.3

|-- Sale: struct (nullable = true)
|    |-- DepartmentID: string (nullable = true)
|    |-- Tax: struct (nullable = true)
|    |    |-- #VALUE: string (nullable = true)
|    |    |-- @TaxExempt: boolean (nullable = true)
|    |    |-- @TaxRate: double (nullable = true)

Then use the below query to select xml attributes, after registering the temptable

sqlContext.sql("select Sale.Tax['@TaxRate'] as TaxRate from temptable").show();

Below is the Result

| TaxRate|

+-----+

|10.25|

Starting from 0.4.1, i think the attributes by default starts with underscore(_), in this case just use _ instead of @ while querying attributes.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!