hadoop-lzo

Read uncompressed thrift files in spark

心不动则不痛 提交于 2019-12-13 13:26:35
问题 I'm trying to get spark to read uncompressed thrift files from s3. So far it has not been working. data is loaded in s3 as uncompressed thrift files. The source is AWS Kinesis Firehose. I have a tool that deserializes files with no problem, so I know that thrift serialization/deserialization works. in spark, im using newAPIHadoopFile using elephantbird's LzoThriftBlockInputFormat, I am able to successfully read lzo-compressed thrift files I can't figure out what InputFormat should I use to

Class com.hadoop.compression.lzo.LzoCodec not found for Spark on CDH 5?

☆樱花仙子☆ 提交于 2019-11-27 21:33:18
I have been working on this problem for two days and still have not find the way. Problem : Our Spark installed via newest CDH 5 always complains about the lost of LzoCodec class, even after I install the HADOOP_LZO through Parcels in cloudera manager. We are running MR1 on CDH 5.0.0-1.cdh5.0.0.p0.47 . Try to fix : The configurations in official CDH documentation about 'Using the LZO Parcel ' are also added but the problem is still there. Most of the googled posts give similar advices to the above. I also suspect that the spark is trying to run against YARN that is not activated there; but I

Class com.hadoop.compression.lzo.LzoCodec not found for Spark on CDH 5?

泪湿孤枕 提交于 2019-11-26 20:41:11
问题 I have been working on this problem for two days and still have not find the way. Problem : Our Spark installed via newest CDH 5 always complains about the lost of LzoCodec class, even after I install the HADOOP_LZO through Parcels in cloudera manager. We are running MR1 on CDH 5.0.0-1.cdh5.0.0.p0.47 . Try to fix : The configurations in official CDH documentation about 'Using the LZO Parcel' are also added but the problem is still there. Most of the googled posts give similar advices to the