Spark - Can a MultiMap be converted to a DataFrame in JAVA

纵然是瞬间 提交于 2019-12-06 14:52:17

Alas the Java parallelize method takes either a list of T or for parallelizePairs a list of Tuple<K, V>. So you will need to convert. While the createDataFrame only works of RDDs and Scala Seq and needs a schema (either a bean or a StructType).

To make it Even More Fun com.google.common.collect.ImmutableEntry is not serializable, so you need to convert in Java, so a Java-ficated version of @Pankaj Arora solution would not work unless you moved the conversion logic into Java. I.e.

public class Value implements Serializable {
    public Value(Double a, Float b) {
        this.a = a;
        this.b = b;
    }
    Double a;
    Float b;

    public void setA(Double a) {
        this.a = a;
    }
    public void setB(Float b) {
        this.b = b;
    }
    public Double getA() {
        return a;
    }
    public Float getB() {
        return b;
    }

    public String toString() {
        return "[" +a +","+b+"]";
    }
}


    Multimap<Double, Float> data = LinkedListMultimap.create();
    data.put(1d, 1f);
    data.put(1d, 2f);
    data.put(2d, 3f);

    List<Value> values = data.asMap().entrySet()
            .stream()
            .flatMap(x -> x.getValue()
                    .stream()
                    .map(y -> new Value(x.getKey(), y)))
            .collect(Collectors.toList());

    sqlContext.createDataFrame(sc.parallelize(values), Value.class).show();

Given your edit I'd look at creating objects (rather than a multimap) from the off.

case class Output(a : Double ,b : Int )
val input = Map(1.50E8-> List(10, 20) ,  1.51E8-> List( -10, -13, -14, -15 ), 1.52E8-> List(-10, -11)).toArray
val inputRdd = sc.parallelize(input)
val queryMV = inputRdd.flatMap(x=> x._2.map(y=> Output(x._1, y))).toDF
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!