Converting series from pandas to pyspark: need to use “groupby” and “size”, but pyspark yields error

前端未结

关注

 0  590

I am converting some code from Pandas to pyspark. In pandas, lets imagine I have the following mock dataframe, df:

And in pandas, I define a certain variable