I would like to add a string to an existing column. For example, df[\'col1\'] has values as \'1\', \'2\', \'3\' etc and I would like to concat stri
Another option here is to use pyspark.sql.functions.format_string() which allows you to use C printf style formatting.
Here's an example where the values in the column are integers.
import pyspark.sql.functions as f
df = sqlCtx.createDataFrame([(1,), (2,), (3,), (10,), (100,)], ["col1"])
df.withColumn("col2", f.format_string("%03d", "col1")).show()
#+----+----+
#|col1|col2|
#+----+----+
#| 1| 001|
#| 2| 002|
#| 3| 003|
#| 10| 010|
#| 100| 100|
#+----+----+
Here the format "%03d" means print an integer number left padded with up to 3 zeros. This is why the 10 gets mapped to 010 and 100 does not change at all.
Or if you wanted to add exactly 3 zeros in the front:
df.withColumn("col2", f.format_string("000%d", "col1")).show()
#+----+------+
#|col1| col2|
#+----+------+
#| 1| 0001|
#| 2| 0002|
#| 3| 0003|
#| 10| 00010|
#| 100|000100|
#+----+------+