How to create BinaryType Column using multiple columns of a pySpark Dataframe?

血红的双手。 提交于 2021-01-29 17:53:58

问题


I have recently started working with pySpark so don't know about many details regarding this.

I am trying to create a BinaryType column in a data frame? But struggling to do it...

for example, let's take a simple df

df.show(2)

+---+----------+
|  col1|col2|
+---+----------+
|  "1"| null|
|  "2"| "20"|
+---+----------+

Now I want to have a third column "col3" with BinaryType like

|  col1|col2| col3|
+---+----------+
|  "1"| null|[1 null]
|  "2"| "20"|[ 2 20]
+---+----------+

How should i do it?


回答1:


Try this:

a = [('1', None), ('2', '20')]
df = spark.createDataFrame(a, ['col1', 'col2'])
df.show()

+----+----+
|col1|col2|
+----+----+
|   1|null|
|   2|  20|
+----+----+



df = df.withColumn('col3', F.array(['col1', 'col2']))
df.show()


+----+----+-------+
|col1|col2|   col3|
+----+----+-------+
|   1|null|   [1,]|
|   2|  20|[2, 20]|
+----+----+-------+



来源:https://stackoverflow.com/questions/57636127/how-to-create-binarytype-column-using-multiple-columns-of-a-pyspark-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!