Merge error : negative length vectors are not allowed

前端未结

关注

 3  434

I tried to merge two data.frames, and they are like below:

   GVKEY YEAR coperol     delta     vega firm_related_wealth
1 001045 1992       1  38.88885 17.86


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2020-12-06 06:27
              
            
            
                                                                       
I had the same issue while performing a task in r similar to vlookup present in MS Excel. This error is there because your key column is not good enough to map data from one table to another table. Better remove zeros or make a column unique as explained by @Assaf Wool. Hope it will help!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2020-12-06 06:35
              
            
            
                                                                       
I'm not sure how merge is implemented, but there seems to be a big difference when you try to merge by one column or by two, as you can see in the following simulation:

> df1<-data.frame(a=1:200000,b=2*(1:200000),c=3*(1:200000))
> df2<-data.frame(a=-df1$a,b=-df1$b,d=4*(1:200000))
> ss<-sample(200000,10000)
> df2[ss,1:2]<-df1[ss,1:2]
> system.time(df3<-merge(x=df1,y=df2,by=c('a','b')))
user  system elapsed 
1.25    0.00    1.25
> system.time(df4<-merge(x=df1,y=df2,by='a'))
user  system elapsed 
0.06    0.00    0.06 


Looking at the system memory the two-column merge used a lot more memory as well. There's probably a cartesian product in there somewhere and I guess this is what's causing your error.

What you could do is to create a new column concatenating GVKEY and YEAR for each data.frame and merge by that column.

a$newKey<-paste(a$GVKEY,a$YEAR,sep='_')
b$newKey<-paste(b$GVKEY,b$YEAR,sep='_')
c<-merge(a,b,by='newKey')


You would need to clean up the columns in the result, since GVKEY and YEAR would both appear twice, but at least the merge should work.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2020-12-06 06:37
              
            
            
                                                                       
You are getting this error because the data.frame / data.table created by the join has more than 2^31 - 1 rows (2,147,483,647). 

Due to the way vectors are constructed internally by R, the maximum length of any vector is 2^31 - 1 elements (see: https://stackoverflow.com/a/5234293/2341679). Since a data.frame / data.table is really a list() of vectors, this limit also applies to the number of rows. 

As other people have commented and answered, unfortunately you won't be able to construct this data.table, and its likely there are that many rows because of duplicate matches between your two data.tables (these may or may not be intentional on your part).

The good news is, if the duplicate matches are not errors, and you still want to perform the join, there is a way around it: you just need to do whatever computation you wanted to do on the resulting data.table in the same call as the join using the data.table[] operator, e.g.:

dt_left[dt_right, on = .(GVKEY, YEAR), 
        j = .(sum(firm_related_wealth), mean(fracdirafterindep),
        by = .EACHI]


If you're not familiar with the data.table syntax, you can perform calculations on columns within a data.table as shown above using the j argument. When performing a join using this syntax, computation in j is performed on the data.table created by the join. 

The key here is the by = .EACHI argument. This breaks the join (and subsequent computation in j) down into smaller components: one data.table for each row in dt_right and its matches in dt_left, avoiding the problem of creating a data.table with > 2^31 - 1 rows.  
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复