Skipping the first line of the .csv in Map reduce java

后端 未结 2 419
天命终不由人
天命终不由人 2021-01-02 19:59

As mapper function runs for every line , can i know the way how to skip the first line. For some file it consists of column header which i want to ignore

相关标签:
2条回答
  • 2021-01-02 20:36

    As the file can be stored in multiple nodes, we cant say in which machine the header part present and which mapper is processing that part of file. We can filter out the header in the Mapper itself.For this you have to know the headers. For example

     String[] cols= line.tokenize();
     if(cols[0].equals("header")) {
        // skip
     } else {
       // emit
    }
    
    0 讨论(0)
  • 2021-01-02 20:38

    In mapper while reading the file, the data is read in as key-value pair. The key is the byte offset where the next line starts. For line 1 it is always zero. So in mapper function do the following

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException {
            try {
                if (key.get() == 0 && value.toString().contains("header") /*Some condition satisfying it is header*/)
                    return;
                else {
                    // For rest of data it goes here
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        }     
    
    0 讨论(0)
提交回复
热议问题