Can aggregation operations be part of a pipeline(Spark/PySpark)? An example of aggregation operations is to aggregate a dataframe to user id level and compute various statis