As an alternative to @gasparms solution, I think one can try a filter followed by rdd.sortyBy operation. You filter each record that meets key criteria. Pre requisite is that you need to keep track of all your keys(filter combinations). You can also build it as you traverse through records.