I have an input dataframe(ip_df), data in this dataframe looks like as below:
id timestamp_value
1 2017-08-01T14:30:00+05:3
You can use parser and tz in dateutil library.
I assume you have Strings and you want a String Column :
from dateutil import parser, tz
from pyspark.sql.types import StringType
from pyspark.sql.functions import col, udf
# Create UTC timezone
utc_zone = tz.gettz('UTC')
# Create UDF function that apply on the column
# It takes the String, parse it to a timestamp, convert to UTC, then convert to String again
func = udf(lambda x: parser.parse(x).astimezone(utc_zone).isoformat(), StringType())
# Create new column in your dataset
df = df.withColumn("new_timestamp",func(col("timestamp_value")))
It gives this result :
+---+-------------------------+-------------------------+
|id |timestamp_value |new_timestamp |
+---+-------------------------+-------------------------+
|1 |2017-08-01T14:30:00+05:30|2017-08-01T09:00:00+00:00|
|2 |2017-08-01T14:30:00+06:30|2017-08-01T08:00:00+00:00|
|3 |2017-08-01T14:30:00+07:30|2017-08-01T07:00:00+00:00|
+---+-------------------------+-------------------------+
Finally you can drop and rename :
df = df.drop("timestamp_value").withColumnRenamed("new_timestamp","timestamp_value")