How to read a column of csv as dtype list using pandas?

后端未结

关注

 5  835

I have a csv file with 3 columns, wherein each row of Column 3 has list of values in it. As you can see from the following table structure

Col1,Col2,Col3
1,a


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2020-12-02 23:12
              
            
            
                                                                       
If you have the option to write the file - 

you can use pd.to_parquet and pd.read_parquet (instead of csv).

It will properly parse this column.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-02 23:23
              
            
            
                                                                       
Adding a replace to Cunninghams answer:

df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").replace("'","").split(", ")})


See also pandas - convert string into list of strings
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2020-12-02 23:32
              
            
            
                                                                       
@Padraic Cunningham's answer will not work if you have to parse lists of strings that do not have quotes. For example, literal_eval will successfully parse "['a', 'b', 'c']", but not "[a, b, c]". To load strings like this, use the PyYAML library.

import io 
import pandas as pd

data = '''
A,B,C
"[1, 2, 3]",True,"[a, b, c]"
"[4, 5, 6]",False,"[d, e, f]"
'''

df = pd.read_csv(io.StringIO(data), sep=',')                                    
df
           A      B          C
0  [1, 2, 3]   True  [a, b, c]
1  [4, 5, 6]  False  [d, e, f]

df['C'].tolist()                                                           
# ['[a, b, c]', '[d, e, f]']




import yaml
df[['A', 'C']] = df[['A', 'C']].applymap(yaml.safe_load) 

df['C'].tolist()                                                           
# [['a', 'b', 'c'], ['d', 'e', 'f']]


yaml can be installed using pip install pyyaml.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2020-12-02 23:34
              
            
            
                                                                       
You could use the ast lib:

from ast import literal_eval


df.Col3 = df.Col3.apply(literal_eval)
print(df.Col3[0][0])
Proj1


You can also do it when you create the dataframe from the csv, using converters:

df = pd.read_csv("in.csv",converters={"Col3": literal_eval})


If you are sure the format is he same for all strings, stripping and splitting will be a lot faster:

 df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").split(", ")})


But you will end up with the strings wrapped in quotes
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲哀的现实        
                
              
                            
                2020-12-02 23:34
              
            
            
                                                                       
I have a different approach for this, which can be used for string representations of other data types, besides just lists.

You can use the json library and apply json.loads() to the desired column. e.g

import json
df.my_column = df.my_column.apply(json.loads)


For this to work, however, your input strings must be enclosed in double quotations.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复