How to resolve the AnalysisException: resolved attribute(s) in Spark

后端 未结 12 898
故里飘歌
故里飘歌 2020-12-14 07:03
val rdd = sc.parallelize(Seq((\"vskp\", Array(2.0, 1.0, 2.1, 5.4)),(\"hyd\",Array(1.5, 0.5, 0.9, 3.7)),(\"hyd\", Array(1.5, 0.5, 0.9, 3.2)),(\"tvm\", Array(8.0, 2.9,         


        
12条回答
  •  暖寄归人
    2020-12-14 07:16

    For java developpers, try to call this method:

    private static Dataset cloneDataset(Dataset ds) {
        List filterColumns = new ArrayList<>();
        List filterColumnsNames = new ArrayList<>();
        scala.collection.Iterator it = ds.exprEnc().schema().toIterator();
        while (it.hasNext()) {
            String columnName = it.next().name();
            filterColumns.add(ds.col(columnName));
            filterColumnsNames.add(columnName);
        }
        ds = ds.select(JavaConversions.asScalaBuffer(filterColumns).seq()).toDF(scala.collection.JavaConverters.asScalaIteratorConverter(filterColumnsNames.iterator()).asScala().toSeq());
        return ds;
    }
    

    on both datasets just before the joining, it clone the datasets into new ones:

    df1 = cloneDataset(df1); 
    df2 = cloneDataset(df2);
    Dataset join = df1.join(df2, col("column_name"));
    // if it didn't work try this
    final Dataset join = cloneDataset(df1.join(df2, columns_seq)); 
    

提交回复
热议问题