I have a pyspark dataframe with the following schema
+-----------+---------+----------+-----------+ | userID|grouping1| grouping2| features| +----------