I have a spark data frame in the following format.
pid grouped_ids ------------------------ 12 12,13,14,78 6 6,8,12,23 19