Explode multiple columns in Spark SQL table

前端 未结 3 1620
小鲜肉
小鲜肉 2020-12-19 08:45

There was a question regarding this issue here:

Explode (transpose?) multiple columns in Spark SQL table

Suppose that we have extra columns as below:

3条回答
  •  粉色の甜心
    2020-12-19 09:07

    The approach with the zip udf seems ok, but you need to extend if for more collections. Unfortunately there is no really nice way to zip 4 Seqs, but this should work:

    def assertSameSize(arrs:Seq[_]*) = {
     assert(arrs.map(_.size).distinct.size==1,"sizes differ") 
    }
    
    val zip4 = udf((xa:Seq[Long],xb:Seq[Long],xc:Seq[String],xd:Seq[String]) => {
        assertSameSize(xa,xb,xc,xd)
        xa.indices.map(i=> (xa(i),xb(i),xc(i),xd(i)))
      }
    )
    

提交回复
热议问题