How to Import XML data into Hive using attributes as columns

谁都会走 提交于 2019-12-11 09:47:28

问题


I am pretty new with HiveQL and I am kinda stuck :S

I have a data stored in xml format and I want to extract fields from this xml file in a Hive table of columns (string Titles_2 , sting Artists_2, string Albums_2) .

A Sample of the xml data:

<?xml version="1.0" encoding="UTF-8"?><MC><SC><S uid="2" gen="" yr="2011" art="Samsung" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Samsung/Music" alb="Samsung" ttl="Over the horizon"/><S uid="37" gen="" yr="2010" art="Jason Derulo" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Jason Derulo/Jason Derulo" alb="Jason Derulo" ttl="Whatcha Say"/><S uid="38" gen="" yr="2010" art="Jason Derulo" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Jason Derulo/Jason Derulo" alb="Jason Derulo" ttl="In My Head"/><S uid="39" gen="" yr="2011" art="Alexandra Stan" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Alexandra Stan/Mr_ Saxobeat - Single" alb="Mr. Saxobeat - Single" ttl="Mr. Saxobeat (Extended Version)"/><S uid="40" gen="" yr="2011" art="Bushido" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Bushido/Jenseits von Gut und Böse (Premium Edition)" alb="Jenseits von Gut und Böse (Premium Edition)" ttl="Wie ein Löwe"/><S uid="41" gen="" yr="2011" art="Bushido" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Bushido/Jenseits von Gut und Böse (Premium Edition)" alb="Jenseits von Gut und Böse (Premium Edition)" ttl="Verreckt"/><S uid="42" gen="" yr="2011" art="Lucenzo" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Lucenzo/Danza Kuduro (feat_ Don Omar) [From _Fast &amp; Furious 5_] - Single" alb="Danza Kuduro (feat. Don Omar) [From &quot;Fast &amp; Furious 5&quot;] - Single" ttl="Danza Kuduro (feat. Don Omar) [From &quot;Fast &amp; Furious 5&quot;]"/><S uid="121" gen="" yr="701" art="Michael Jackson" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/external_sd/Music/Michael Jackson/Bad [Bonus Tracks]" alb="Bad [Bonus Tracks]" ttl="Voice-Over Intro/Quincy Jones Interview #1 [*]"/></SC><PC/></MC>

This data is stored in a table called xmlout_2(line) .

Now I ran these xpath commands to build the HiveQL table Stores but it only adds the first song of each line. Any idea why is it behaving like that?

create view xmlout_2(line) as SELECT * from hivetesttable;

    CREATE VIEW Stores(Titles_2,  Artists_2, Albums_2) AS
    SELECT 
    xpath_string ( line, '/MC/SC/*/@ttl'),
    xpath_string (line, 'MC/SC/*/@art'),
    xpath_string (line, '/MC/SC/*/@alb')
    FROM  xmlout_2;

if I try xpath instead of xpath_string I got an array of strings instead of strings.

create view xmlout_2(line) as SELECT * from hivetesttable;

    CREATE VIEW Stores(Titles_2,  Artists_2, Albums_2) AS
    SELECT 
    xpath ( line, '/MC/SC/*/@ttl'),
    xpath (line, 'MC/SC/*/@art'),
    xpath (line, '/MC/SC/*/@alb')
    FROM  xmlout_2;

I am thinking of exploding the columns after that but exploding can be only used on a single column.

来源:https://stackoverflow.com/questions/15578992/how-to-import-xml-data-into-hive-using-attributes-as-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!