Casting a new derived column in a DataFrame from boolean to integer

↘锁芯ラ 提交于 2019-12-08 17:06:22

问题


Suppose I have a DataFrame x with this schema:

xSchema = StructType([ \
    StructField("a", DoubleType(), True), \
    StructField("b", DoubleType(), True), \
    StructField("c", DoubleType(), True)])

I then have the DataFrame:

DataFrame[a :double, b:double, c:double]

I would like to have an integer derived column. I am able to create a boolean column:

x = x.withColumn('y', (x.a-x.b)/x.c > 1)

My new schema is:

DataFrame[a :double, b:double, c:double, y: boolean]

However, I would like column y to contain 0 for False and 1 for True.

The cast function can only operate on a column and not a DataFrame and the withColumn function can only operate on a DataFrame. How to I add a new column and cast it to integer at the same time?


回答1:


Expression you use evaluates to column so you can cast directly like this:

x.withColumn('y', ((x.a-x.b) / x.c > 1).cast('integer')) # Or IntegerType()


来源:https://stackoverflow.com/questions/33354571/casting-a-new-derived-column-in-a-dataframe-from-boolean-to-integer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!