How to convert a mixed-type Matrix to DataFrame in Julia recognising the column types

前端未结

关注

 4  2186

深忆病人 2021-01-26 02:50

One nice feature of DataFrames is that it can store columns with different types and it can \"auto-recognise\" them, e.g.:

using DataFrames, DataStructures

df1


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   死守一世寂寞
                                             
                
                
                (楼主)
            
              
              
                2021-01-26 03:08
              

            
            
                        
mat2df(mat) = 
    DataFrame([[mat[2:end,i]...] for i in 1:size(mat,2)], Symbol.(mat[1,:]))


Seems to work and is faster than @dan-getz's answer (at least for this data matrix) :)

using DataFrames, BenchmarkTools

dataMatrix = [
    "parName"   "region"    "forType"       "value";
    "vol"       "AL"        "broadL_highF"  3.3055628012;
    "vol"       "AL"        "con_highF"     2.1360975151;
    "vol"       "AQ"        "broadL_highF"  5.81984502;
    "vol"       "AQ"        "con_highF"     8.1462998309;
]

mat2df(mat) = 
    DataFrame([[mat[2:end,i]...] for i in 1:size(mat,2)], Symbol.(mat[1,:]))

function mat2dfDan(mat)
    s = join([join([dataMatrix[i,j] for j in indices(dataMatrix, 2)], '\t') 
                for i in indices(dataMatrix, 1)],'\n')

    DataFrames.inlinetable(s; separator='\t', header=true)
end


-

julia> @benchmark mat2df(dataMatrix)

BenchmarkTools.Trial: 
  memory estimate:  5.05 KiB
  allocs estimate:  75
  --------------
  minimum time:     18.601 μs (0.00% GC)
  median time:      21.318 μs (0.00% GC)
  mean time:        31.773 μs (2.50% GC)
  maximum time:     4.287 ms (95.32% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark mat2dfDan(dataMatrix)

BenchmarkTools.Trial: 
  memory estimate:  17.55 KiB
  allocs estimate:  318
  --------------
  minimum time:     69.183 μs (0.00% GC)
  median time:      81.326 μs (0.00% GC)
  mean time:        90.284 μs (2.97% GC)
  maximum time:     5.565 ms (93.72% GC)
  --------------
  samples:          10000
  evals/sample:     1

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复