val rdd = sc.parallelize(Seq((\"vskp\", Array(2.0, 1.0, 2.1, 5.4)),(\"hyd\",Array(1.5, 0.5, 0.9, 3.7)),(\"hyd\", Array(1.5, 0.5, 0.9, 3.2)),(\"tvm\", Array(8.0, 2.9,
If you have df1, and df2 derived from df1, try renaming all columns in df2 such that no two columns have identical name after join. So before the join:
so instead of df1.join(df2...
do
# Step 1 rename shared column names in df2.
df2_renamed = df2.withColumnRenamed('columna', 'column_a_renamed').withColumnRenamed('columnb', 'column_b_renamed')
# Step 2 do the join on the renamed df2 such that no two columns have same name.
df1.join(df2_renamed)