问题
I am able to process two nodes from an xml. And I am getting the output below:
bin/hadoop fs -text /user/root/t-output1/part-r-00000
name:ST17925 currentgrade 1.02
name:ST17926 currentgrade 3.0
name:ST17927 currentgrade 3.0
but I need to have an output like:
studentid curentgrade
ST17925 1.02
ST17926 3.00
ST17927 3.00
How can I achieve this?
My complete source code: https://github.com/studhadoop/xml/blob/master/XmlParser11.java
EDIT: Solution
protected void setup(Context context) throws IOException, InterruptedException {
context.write(new Text("studentid"), new Text("currentgrade"));
}
回答1:
I think it is difficult to do this along with your MapReduce code. The reasons is
- The headers may not be of the same data types
- If the types are same, you can write headers from the setup() method of Reducer code but there is no guarantee that the headers will appear as the first row in the output.
At best what you can do is, create a separate HDFS/ local file with the headers in your map code on the first encounter of the column qualifiers. You need to use appropriate file operations API for creating this file. Later when the job is complete you can use these headers in other programs or merge them together as a single file.
来源:https://stackoverflow.com/questions/16330413/how-to-output-first-row-as-column-qualifier-names