In pyspark, how do you add/concat a string to a column?

前端 未结 2 896
生来不讨喜
生来不讨喜 2020-12-17 18:05

I would like to add a string to an existing column. For example, df[\'col1\'] has values as \'1\', \'2\', \'3\' etc and I would like to concat stri

2条回答
  •  北荒
    北荒 (楼主)
    2020-12-17 19:02

    Another option here is to use pyspark.sql.functions.format_string() which allows you to use C printf style formatting.

    Here's an example where the values in the column are integers.

    import pyspark.sql.functions as f
    df = sqlCtx.createDataFrame([(1,), (2,), (3,), (10,), (100,)], ["col1"])
    df.withColumn("col2", f.format_string("%03d", "col1")).show()
    #+----+----+
    #|col1|col2|
    #+----+----+
    #|   1| 001|
    #|   2| 002|
    #|   3| 003|
    #|  10| 010|
    #| 100| 100|
    #+----+----+
    

    Here the format "%03d" means print an integer number left padded with up to 3 zeros. This is why the 10 gets mapped to 010 and 100 does not change at all.

    Or if you wanted to add exactly 3 zeros in the front:

    df.withColumn("col2", f.format_string("000%d", "col1")).show()
    #+----+------+
    #|col1|  col2|
    #+----+------+
    #|   1|  0001|
    #|   2|  0002|
    #|   3|  0003|
    #|  10| 00010|
    #| 100|000100|
    #+----+------+
    

提交回复
热议问题