Dropping multiple columns from Spark dataframe by Iterating through the columns from a Scala List of Column names

后端 未结 4 1830
情歌与酒
情歌与酒 2020-12-29 00:35

I have a dataframe which has columns around 400, I want to drop 100 columns as per my requirement. So i have created a Scala List of 100 column names. And then i want to ite

4条回答
  •  Happy的楠姐
    2020-12-29 01:27

    You can just do,

    def dropColumns(inputDF: DataFrame, dropList: List[String]): DataFrame = 
        dropList.foldLeft(inputDF)((df, col) => df.drop(col))
    

    It will return you the DataFrame without the columns passed in dropList.

    As an example (of what's happening behind the scene), let me put it this way.

    scala> val list = List(0, 1, 2, 3, 4, 5, 6, 7)
    list: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7)
    
    scala> val removeThese = List(0, 2, 3)
    removeThese: List[Int] = List(0, 2, 3)
    
    scala> removeThese.foldLeft(list)((l, r) => l.filterNot(_ == r))
    res2: List[Int] = List(1, 4, 5, 6, 7)
    

    The returned list (in our case, map it to your DataFrame) is the latest filtered. After each fold, the latest is passed to the next function (_, _) => _.

提交回复
热议问题