Count each Text value in XML using HADOOP mapreduce pgm

问题

I am new to Hadoop. I need to parse a small xml file using mapreduce program in java. I am using hadoop 1.0.4 say my xml file is

<configuration>
<property>
 <name>adv</name>
 <value>a</value>
 <dup>school</dup>
</property>
<property>
 <name>aghy</name>
 <value>a</value>
 <dup>bk</dup>
</property>
</configuration>

i need an output like this adv 1 a 2 aghy 1 school 1 bk 1

how can i edit the code https://github.com/studhadoop/xmlparsing-hadoop/blob/master/XmlParser11.java .Any working idea. pls help me out.

回答1:

You'll need a couple of things:

An input format for processing XML Files, suggest you look at Mahout's XMLInputFormat
A Parser for the XML String passed to the mapper (SAX or DOM), maybe define some JAXB objects to bind to

And some useful links:

Parsing XmlInputFormat element larger than hdfs block size

来源：https://stackoverflow.com/questions/15497213/count-each-text-value-in-xml-using-hadoop-mapreduce-pgm

标签

xml

Hadoop

MapReduce

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!