Reading/writing with Avro schemas AND Parquet format in SparkSQL

落爺英雄遲暮 提交于 2019-12-05 11:08:26

I am doing something similar. I use avro schema to write into parquet file however, dont read it as avro. But the same technique should work on read as well. I am not sure if this is the best way to do it, but here it is anyways: I have AvroData.avsc which has the avro schema.

KafkaUtils.createDirectStream[String,Array[Byte],StringDecoder,DefaultDecoder,Tuple2[String, Array[Byte]]](ssc, kafkaProps, fromOffsets, messageHandler)


kafkaArr.foreachRDD  { (rdd,time) 
       => { val schema =  SchemaConverters.toSqlType(AvroData.getClassSchema).dataType.asInstanceOf[StructType] val ardd = rdd.mapPartitions{itr =>
              itr.map { r =>
try {
                    val cr = avroToListWithAudit(r._2, offsetSaved, loadDate, timeNow.toString)
                    Row.fromSeq(cr.toArray)
    } catch{
      case e:Exception => LogHandler.log.error("Exception while converting to Avro" + e.printStackTrace())
      System.exit(-1)
      Row(0)  //This is just to allow compiler to accept. On exception, the application will exit before this point
} 
} 
}


  public static List avroToListWithAudit(byte[] kfkBytes, String kfkOffset, String loaddate, String loadtime ) throws IOException {
        AvroData av = getAvroData(kfkBytes);
        av.setLoaddate(loaddate);
        av.setLoadtime(loadtime);
        av.setKafkaOffset(kfkOffset);
        return avroToList(av);
    }



 public static List avroToList(AvroData a) throws UnsupportedEncodingException{
        List<Object> l = new ArrayList<>();
        for (Schema.Field f : a.getSchema().getFields()) {
            String field = f.name().toString();
            Object value = a.get(f.name());
            if (value == null) {
                //System.out.println("Adding null");
                l.add(""); 
            }
            else {
                switch (f.schema().getType().getName()){
                    case "union"://System.out.println("Adding union");
                        l.add(value.toString());
                        break;

                    default:l.add(value);
                        break;
                }

            }
        }
        return l;
    }

The getAvroData method needs to have code to construct the avro object from raw bytes. I am also trying to figure out a way to do that without having to specifying each attribute setter explicitly, but seems like there isnt one.

public static AvroData getAvroData (bytes)
{
AvroData av = AvroData.newBuilder().build();
        try {
            av.setAttr(String.valueOf("xyz"));
        .....
    }
   } 

Hope it helps

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!