Having this dataframe I am getting Column is not iterable when I try to groupBy and getting max:
linesWithSparkDF
+---+-----+
| id|cycle|
+---+-----+
| 31|
The general technique for avoiding this problem -- which are unfortunate namespace collisions between some Spark SQL function names and Python built-in function names -- is to import
the Spark SQL functions module like this:
from pyspark.sql import functions as F
# USAGE: F.col(), F.max(), ...
Then, using the OP's example, you'd simply apply F
like this:
linesWithSparkGDF = linesWithSparkDF.groupBy(F.col("id")) \
.agg(F.max(F.col("cycle")))
This is the general way of avoiding this issue, and is found in practice.