Using Spark 1.6, I have a Spark DataFrame column (named let\'s say col1) with values A, B, C, DS, DNS, E, F, G and H and I want to create a new col
DataFrame column
col1
Sounds like the simplest solution would be to use the replace function: http://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
mapping= { 'A': '1', 'B': '2' } df2 = df.replace(to_replace=mapping, subset=['yourColName'])