How do I get a SQL row_number equivalent for a Spark RDD?
I need to generate a full list of row_numbers for a data table with many columns. In SQL, this would look like this: select key_value, col1, col2, col3, row_number() over (partition by key_value order by col1, col2 desc, col3) from temp ; Now, let's say in Spark I have an RDD of the form (K, V), where V=(col1, col2, col3), so my entries are like (key1, (1,2,3)) (key1, (1,4,7)) (key1, (2,2,3)) (key2, (5,5,5)) (key2, (5,5,9)) (key2, (7,5,5)) etc. I want to order these using commands like sortBy(), sortWith(), sortByKey(), zipWithIndex, etc. and have a new RDD with the correct row_number (key1,