Is compression/decompression of gzip data transparent in Hadoop/PIG?

天涯浪子 提交于 2019-12-11 08:14:38

问题


I read somewhere that Hadoop has a built-in support for compression and decompression but I guess it is about mapper output (by setting some properties)?

I wonder if there is any particular PIG load/store functions I can use for reading compressed data or outputting data as compressed?


回答1:


The PigStorage handles compressed input by examining the file names:

  • *.bz2 / *.bz - org.apache.pig.bzip2r.Bzip2TextInputFormat
  • Everything else uses org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat -- This extends o.a.h.mapreduce.TextinputFormat which can handle .gz and zippy files if you have the codecs installed

Output is handled via some properties:

  • output.compression.enabled - true / false
  • output.compression.codec - the class name of the codec to use (org.apache.hadoop.io.compress.GzipCodec for gzip)

If you're feeling up to it, digging through the PigStorage.java may be of interest to you



来源:https://stackoverflow.com/questions/9896584/is-compression-decompression-of-gzip-data-transparent-in-hadoop-pig

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!