Skipping the first line of the .csv in Map reduce java

核能气质少年 提交于 2019-12-30 05:02:18

问题


As mapper function runs for every line , can i know the way how to skip the first line. For some file it consists of column header which i want to ignore


回答1:


In mapper while reading the file, the data is read in as key-value pair. The key is the byte offset where the next line starts. For line 1 it is always zero. So in mapper function do the following

    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException {
        try {
            if (key.get() == 0 && value.toString().contains("header") /*Some condition satisfying it is header*/)
                return;
            else {
                // For rest of data it goes here
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }     



回答2:


As the file can be stored in multiple nodes, we cant say in which machine the header part present and which mapper is processing that part of file. We can filter out the header in the Mapper itself.For this you have to know the headers. For example

 String[] cols= line.tokenize();
 if(cols[0].equals("header")) {
    // skip
 } else {
   // emit
}


来源:https://stackoverflow.com/questions/37541109/skipping-the-first-line-of-the-csv-in-map-reduce-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!