Using multiple mapper inputs in one streaming job on hadoop?

后端 未结 2 632
醉酒成梦
醉酒成梦 2020-12-20 01:43

In java I would use:

MultipleInputs.addInputPath(conf, path, inputFormatClass, mapperClass)

to add multiple inputs with a differ

相关标签:
2条回答
  • 2020-12-20 02:22

    I suppose this can help you: https://github.com/hyonaldo/hadoop-multiple-streaming.

    Here you can see "different mappers for these different input path" as well:

    hadoop jar hadoop-multiple-streaming.jar \  
      -input    myInputDirs \  
      -multiple "outputDir1|mypackage.Mapper1|mypackage.Reducer1" \  
      -multiple "outputDir2|mapper2.sh|reducer2.sh" \  
      -multiple "outputDir3|mapper3.py|reducer3.py" \  
      -multiple "outputDir4|/bin/cat|/bin/wc" \  
      -libjars  "libDir/mypackage.jar" \
      -file     "libDir/mapper2.sh" \  
      -file     "libDir/mapper3.py" \  
      -file     "libDir/reducer2.sh" \  
      -file     "libDir/reducer3.py"
    
    0 讨论(0)
  • 2020-12-20 02:25

    You can use multiple -input options to specify multiple input paths:

    hadoop jar hadoop-streaming.jar -input foo.txt -input bar.txt ...
    
    0 讨论(0)
提交回复
热议问题