问题
I am new to Hadoop. I need to parse a small xml file using mapreduce program in java. I am using hadoop 1.0.4 say my xml file is
<configuration>
<property>
<name>adv</name>
<value>a</value>
<dup>school</dup>
</property>
<property>
<name>aghy</name>
<value>a</value>
<dup>bk</dup>
</property>
</configuration>
i need an output like this adv 1 a 2 aghy 1 school 1 bk 1
how can i edit the code https://github.com/studhadoop/xmlparsing-hadoop/blob/master/XmlParser11.java .Any working idea. pls help me out.
回答1:
You'll need a couple of things:
- An input format for processing XML Files, suggest you look at Mahout's XMLInputFormat
- A Parser for the XML String passed to the mapper (SAX or DOM), maybe define some JAXB objects to bind to
And some useful links:
- Parsing XmlInputFormat element larger than hdfs block size
来源:https://stackoverflow.com/questions/15497213/count-each-text-value-in-xml-using-hadoop-mapreduce-pgm