Filter by whether column value equals a list in Spark

后端未结

关注

 3  1201

一个人的身影 2020-12-06 06:38

I\'m trying to filter a Spark dataframe based on whether the values in a column equal a list. I would like to do something like this:

filtered_df = df.where(


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   遥遥无期
                                             
                
                
                (楼主)
            
              
              
                2020-12-06 07:16
              

            
            
                        
You can use a combination of "array", "lit" and "array_except" function to achieve this.


We create an array column using lit(array(lit("list"),lit("of"),lit("stuff"))
Then we used array_exept function to get the values present in first array and not present in second array.
Then we filter for empty result array which means all the elements in first array are same as of ["list", "of", "stuff"]



  Note: array_except function is available from spark 2.4.0.


Here is the code:

# Import libraries
from pyspark.sql.functions import *

# Create DataFrame
df = sc.parallelize([
    (1, ['list','of' , 'stuff']),
    (2, ['foo', 'bar']),
    (3, ['foobar']),
    (4, ['list','of' , 'stuff', 'and', 'foo']),
    (5, ['a', 'list','of' , 'stuff']),
]).toDF(['id', 'a'])

# Solution
df1 = df.filter(size(array_except(df["a"], lit(array(lit("list"),lit("of"),lit("stuff"))))) == 0)

# Display result
df1.show() 


Output

+---+-----------------+
| id|                a|
+---+-----------------+
|  1|[list, of, stuff]|
+---+-----------------+


I hope this helps.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复