R biglm with categorical variables

后端未结

关注

 1  2033

I have a large data set I working with in R using some of the big.___() packages. It\'s ~ 10 gigs (100mmR x 15C) and looks like this:

Price


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2020-12-31 19:08
              
            
            
                                                                       
You do not need all the data or all values present in each chunk, you just need all the levels accounted for.  This means that you can have a chunk like this:

curchunk <- data.frame( Price=c(12.45, 33.67), Var1=factor( c(1,1), levels=1:3), 
  Var2 = factor( 1:2, levels=1:3 ) )


and it will work.  Even though there is only 1 value in Var1 and 2 values in Var2, all three levels are present in both so it will do the correct thing.

Also biglm does not break the data into chunks for you, but expects you to give it manageble chunks to work with.  Work through the examples to see this better.  A common methodology with biglm is to read from a file or database, read in the first 'n' rows (where 'n' is a reasonble subset) and pass them to biglm (possibly after making sure all the factors have all the levels specified), then remove that chunk of  data from memory and read in the next 'n' rows and pass that to update, continues with this until the end of the file removing the used chunks each time (so you have enough memory room for the next one).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复