I am struggling with a PySpark code, in particular, I\'d like to call a function on an object col which is not iterable.
from pyspark.sql.functio
PySpark is just the Python API written to support Apache Spark. If you want to use custom python functions, you will have to define a user defined function (udf).
Keep your clean_text() function as is (with the translate line commented out) and try the following:
from pyspark.sql.functions import udf
from pyspark.sql.Types import StringType
def translate(c):
return translator.translate(c, dest='en', src='auto')
translateUDF = udf(translate, StringType())
clean_text_df = uncleanedText.select(
translateUDF(clean_text(col("unCleanedCol"))).alias("sentence")
)
The other functions in your original clean_text (lower and regexp_replace) are built-in spark functions and operate on apyspark.sql.Column.
Be aware that using this udf will bring a performance hit. See: Spark functions vs UDF performance?