I can\'t figure it out, but guess it\'s simple. I have a spark dataframe df. This df has columns \"A\",\"B\" and \"C\". Now let\'s say I have an Array containing the name of
//selectWithout allows you to specify which columns to omit:
df.selectWithout("B")
You were almost there: just map the filtered array to col
and unpack the list using : _*
:
df.select(column_names.filter(_!="B").map(col): _*)
Since Spark 1.4 you can use drop method:
Scala:
case class Point(x: Int, y: Int)
val df = sqlContext.createDataFrame(Point(0, 0) :: Point(1, 2) :: Nil)
df.drop("y")
Python:
df = sc.parallelize([(0, 0), (1, 2)]).toDF(["x", "y"])
df.drop("y")
## DataFrame[x: bigint]
For Spark v1.4 and higher, using drop(*cols)
-
Returns a new DataFrame without the specified column(s).
Example -
df.drop('age').collect()
For Spark v2.3 and higher you could also do it using colRegex(colName)
-
Selects column based on the column name specified as a regex and returns it as Column.
Example-
df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["Col1", "Col2"])
df.select(df.colRegex("`(Col1)?+.+`")).show()
Reference - colRegex, drop
For older versions of Spark, take the list of columns in dataframe, then remove columns you want to drop from it (maybe using set operations) and then use select
to pick the resultant list.
Will be possible to do through [SPARK-12139] REGEX Column Specification for Hive Queries
https://issues.apache.org/jira/browse/SPARK-12139
I had the same problem and solved it this way (oaffdf is a dataframe):
val dropColNames = Seq("col7","col121")
val featColNames = oaffdf.columns.diff(dropColNames)
val featCols = featColNames.map(cn => org.apache.spark.sql.functions.col(cn))
val featsdf = oaffdf.select(featCols: _*)
https://forums.databricks.com/questions/2808/select-dataframe-columns-from-a-sequence-of-string.html