How to “negative select” columns in spark's dataframe

前端未结

关注

 9  2009

I can\'t figure it out, but guess it\'s simple. I have a spark dataframe df. This df has columns \"A\",\"B\" and \"C\". Now let\'s say I have an Array containing the name of

相关标签:

9条回答

故里飘歌

2020-12-15 05:37
//selectWithout allows you to specify which columns to omit:
```
df.selectWithout("B")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2020-12-15 05:40
You were almost there: just map the filtered array to col and unpack the list using : _*:
```
df.select(column_names.filter(_!="B").map(col): _*)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

盖世英雄少女心

2020-12-15 05:46

Since Spark 1.4 you can use drop method:

Scala:

case class Point(x: Int, y: Int)
val df = sqlContext.createDataFrame(Point(0, 0) :: Point(1, 2) :: Nil)
df.drop("y")

Python:

df = sc.parallelize([(0, 0), (1, 2)]).toDF(["x", "y"])
df.drop("y")
## DataFrame[x: bigint]

0 讨论(0)

栀梦

2020-12-15 05:46
For Spark v1.4 and higher, using drop(*cols) -

Returns a new DataFrame without the specified column(s).

Example -
```
df.drop('age').collect()
```
For Spark v2.3 and higher you could also do it using colRegex(colName) -

Selects column based on the column name specified as a regex and returns it as Column.

Example-
```
df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)], ["Col1", "Col2"])
df.select(df.colRegex("`(Col1)?+.+`")).show()
```
Reference - colRegex, drop

For older versions of Spark, take the list of columns in dataframe, then remove columns you want to drop from it (maybe using set operations) and then use select to pick the resultant list.
0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-15 05:48

Will be possible to do through [SPARK-12139] REGEX Column Specification for Hive Queries

https://issues.apache.org/jira/browse/SPARK-12139

0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-12-15 05:49
I had the same problem and solved it this way (oaffdf is a dataframe):
```
val dropColNames = Seq("col7","col121")
val featColNames = oaffdf.columns.diff(dropColNames)
val featCols = featColNames.map(cn => org.apache.spark.sql.functions.col(cn))
val featsdf = oaffdf.select(featCols: _*)
```
https://forums.databricks.com/questions/2808/select-dataframe-columns-from-a-sequence-of-string.html
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页