Count each Text value in XML using HADOOP mapreduce pgm

南笙酒味 提交于 2019-12-07 21:40:10

问题


I am new to Hadoop. I need to parse a small xml file using mapreduce program in java. I am using hadoop 1.0.4 say my xml file is

<configuration>
<property>
 <name>adv</name>
 <value>a</value>
 <dup>school</dup>
</property>
<property>
 <name>aghy</name>
 <value>a</value>
 <dup>bk</dup>
</property>
</configuration>

i need an output like this adv 1 a 2 aghy 1 school 1 bk 1

how can i edit the code https://github.com/studhadoop/xmlparsing-hadoop/blob/master/XmlParser11.java .Any working idea. pls help me out.


回答1:


You'll need a couple of things:

  • An input format for processing XML Files, suggest you look at Mahout's XMLInputFormat
  • A Parser for the XML String passed to the mapper (SAX or DOM), maybe define some JAXB objects to bind to

And some useful links:

  • Parsing XmlInputFormat element larger than hdfs block size


来源:https://stackoverflow.com/questions/15497213/count-each-text-value-in-xml-using-hadoop-mapreduce-pgm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!