发表新帖

发表新帖

Removing duplicate columns after a DF join in Spark

后端未结

关注

 7  708

小鲜肉 2020-12-24 05:46

When you join two DFs with similar column names:

df = df1.join(df2, df1[\'id\'] == df2[\'id\'])

Join works fine but you can\'t call the

7条回答

一向 (楼主)

2020-12-24 06:17
Assuming 'a' is a dataframe with column 'id' and 'b' is another dataframe with column 'id'

I use the following two methods to remove duplicates:

Method 1: Using String Join Expression as opposed to boolean expression. This automatically remove a duplicate column for you
```
a.join(b, 'id')
```
Method 2: Renaming the column before the join and dropping it after
```
b.withColumnRenamed('id', 'b_id')
joinexpr = a['id'] == b['b_id']
a.join(b, joinexpr).drop('b_id)
```
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题