问题
I am pretty new with HiveQL and I am kinda stuck :S
I have a data stored in xml format and I want to extract fields from this xml file in a Hive table of columns (string Titles_2 , sting Artists_2, string Albums_2) .
A Sample of the xml data:
<?xml version="1.0" encoding="UTF-8"?><MC><SC><S uid="2" gen="" yr="2011" art="Samsung" cmp="<unknown>" fld="/mnt/sdcard/Samsung/Music" alb="Samsung" ttl="Over the horizon"/><S uid="37" gen="" yr="2010" art="Jason Derulo" cmp="<unknown>" fld="/mnt/sdcard/Music/Jason Derulo/Jason Derulo" alb="Jason Derulo" ttl="Whatcha Say"/><S uid="38" gen="" yr="2010" art="Jason Derulo" cmp="<unknown>" fld="/mnt/sdcard/Music/Jason Derulo/Jason Derulo" alb="Jason Derulo" ttl="In My Head"/><S uid="39" gen="" yr="2011" art="Alexandra Stan" cmp="<unknown>" fld="/mnt/sdcard/Music/Alexandra Stan/Mr_ Saxobeat - Single" alb="Mr. Saxobeat - Single" ttl="Mr. Saxobeat (Extended Version)"/><S uid="40" gen="" yr="2011" art="Bushido" cmp="<unknown>" fld="/mnt/sdcard/Music/Bushido/Jenseits von Gut und Böse (Premium Edition)" alb="Jenseits von Gut und Böse (Premium Edition)" ttl="Wie ein Löwe"/><S uid="41" gen="" yr="2011" art="Bushido" cmp="<unknown>" fld="/mnt/sdcard/Music/Bushido/Jenseits von Gut und Böse (Premium Edition)" alb="Jenseits von Gut und Böse (Premium Edition)" ttl="Verreckt"/><S uid="42" gen="" yr="2011" art="Lucenzo" cmp="<unknown>" fld="/mnt/sdcard/Music/Lucenzo/Danza Kuduro (feat_ Don Omar) [From _Fast & Furious 5_] - Single" alb="Danza Kuduro (feat. Don Omar) [From "Fast & Furious 5"] - Single" ttl="Danza Kuduro (feat. Don Omar) [From "Fast & Furious 5"]"/><S uid="121" gen="" yr="701" art="Michael Jackson" cmp="<unknown>" fld="/mnt/sdcard/external_sd/Music/Michael Jackson/Bad [Bonus Tracks]" alb="Bad [Bonus Tracks]" ttl="Voice-Over Intro/Quincy Jones Interview #1 [*]"/></SC><PC/></MC>
This data is stored in a table called xmlout_2(line) .
Now I ran these xpath commands to build the HiveQL table Stores but it only adds the first song of each line. Any idea why is it behaving like that?
create view xmlout_2(line) as SELECT * from hivetesttable;
CREATE VIEW Stores(Titles_2, Artists_2, Albums_2) AS
SELECT
xpath_string ( line, '/MC/SC/*/@ttl'),
xpath_string (line, 'MC/SC/*/@art'),
xpath_string (line, '/MC/SC/*/@alb')
FROM xmlout_2;
if I try xpath instead of xpath_string I got an array of strings instead of strings.
create view xmlout_2(line) as SELECT * from hivetesttable;
CREATE VIEW Stores(Titles_2, Artists_2, Albums_2) AS
SELECT
xpath ( line, '/MC/SC/*/@ttl'),
xpath (line, 'MC/SC/*/@art'),
xpath (line, '/MC/SC/*/@alb')
FROM xmlout_2;
I am thinking of exploding the columns after that but exploding can be only used on a single column.
来源:https://stackoverflow.com/questions/15578992/how-to-import-xml-data-into-hive-using-attributes-as-columns