How to apply multiple filters in a for loop for pyspark
问题 I am trying to apply a filter on several columns on an rdd. I want to pass in a list of indices as a parameter to specify which ones to filter on, but pyspark only applies the last filter. I've broken down the code into some simple test cases and tried the non-looped version and they work. test_input = [('0', '00'), ('1', '1'), ('', '22'), ('', '3')] rdd = sc.parallelize(test_input, 1) # Index 0 needs to be longer than length 0 # Index 1 needs to be longer than length 1 for i in [0,1]: rdd =