Read Avro with Spark in java

喜欢而已 提交于 2019-12-13 01:28:43

问题


Can somebody share example of reading avro using java in spark? Found scala examples but no luck with java. Here is the code snippet which is part of code but running into compilation issues with the method ctx.newAPIHadoopFile.

JavaSparkContext ctx = new JavaSparkContext(sparkConf);
Configuration hadoopConf = new Configuration();
JavaRDD<SampleAvro> lines = ctx.newAPIHadoopFile(path, AvroInputFormat.class, AvroKey.class, NullWritable.class, new Configuration());

Regards


回答1:


You can use the spark-avro connector library by Databricks.
The recommended way to read or write Avro data from Spark SQL is by using Spark's DataFrame APIs.

The connector enables both reading and writing Avro data from Spark SQL:

import org.apache.spark.sql.*;

SQLContext sqlContext = new SQLContext(sc);

// Creates a DataFrame from a specified file
DataFrame df = sqlContext.read().format("com.databricks.spark.avro")
    .load("src/test/resources/episodes.avro");

// Saves the subset of the Avro records read in
df.filter($"age > 5").write()
    .format("com.databricks.spark.avro")
    .save("/tmp/output");

Note that this connector has different versions for Spark 1.2, 1.3, and 1.4+:

Spark verconnector
1.2          0.2.0      
1.3          1.0.0      
1.4+        2.0.1      

Using Maven:

<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>spark-avro_2.10</artifactId>
    <version>{AVRO_CONNECTOR_VERSION}</version>
</dependency>

See further info at: Spark SQL Avro Library




回答2:


Here, assuming K is your Key and V is your value:

....

val job = new Job();

job.setInputFormatClass(AvroKeyValueInputFormat<K, V>.class);

FileInputFormat.addInputPaths(job, <inputPaths>);
AvroJob.setInputKeySchema(job, <keySchema>);
AvroJob.setInputValueSchema(job, <valueSchema>);

RDD<AvroKey<K>, AvroValue<V>> avroRDD = 
  sc.newAPIHadoopRDD(job.getConfiguration,
  AvroKeyValueInputFormat<<K>, <V>>,
  AvroKey<K>.class,
  AvroValue<V>.class);


来源:https://stackoverflow.com/questions/34999783/read-avro-with-spark-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!