I have a data set that I\'m trying to process in PySpark. The data (on disk as Parquet) contains user IDs, session IDs, and metadata related to each session. I\'m adding a n