Filtering on number of times a value appears in PySpark
问题 I have a file with a column containing IDs. Usually, an ID appears only once, but occasionally, they're associated with multiple records. I want to count how many times a given ID appeared, and then split into two separate dfs so I can run different operations on both. One df should be where IDs only appear once, and one should be where IDs appear multiple times. I was able to successfully count the number of instances an ID appeared by grouping on ID and joining the counts back onto the