Count every possible pair of values in a column grouped by multiple columns

前端未结

关注

 7  2012

I have a dataframe that looks like this (this is just a subset, actually dataset has 2724098 rows)

> head(dat)

chr   start  end    enhancer motif 
chr10


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2020-12-03 16:54
              
            
            
                                                                       
...if this isn't what you want, I'm giving up. Obviously it isn't optimized for a large data set. This is just a general algorithm that takes natural advantage of R. There are several improvements possible, e.g. with dplyr and data.table. The latter will greatly speed up the [ and %in% operations here.

motif_pairs <- combn(unique(dat$motif), 2)
colnames(motif_pairs) <- apply(motif_pairs, 2, paste, collapse = " ")
motif_pair_counts <- apply(motif_pairs, 2, function(motif_pair) {
  sum(daply(dat[dat$motif %in% motif_pair, ], .(id), function(dat_subset){
    all(motif_pair %in% dat_subset$motif)
  }))
})
motif_pair_counts <- as.data.frame(unname(cbind(t(motif_pairs), motif_pair_counts)))
names(motif_pair_counts) <- c("motif1", "motif2", "count")
motif_pair_counts

#   motif1 motif2 count
# 1  GATA6  GATA4     3
# 2  GATA6    SRF     2
# 3  GATA6  MEF2A     2
# 4  GATA4    SRF     2
# 5  GATA4  MEF2A     2
# 6    SRF  MEF2A     3




Another old version. PLEASE make sure your question is clear!

This is precisely what plyr was designed to accomplish. Try dlply(dat, .(id), function(x) table(x$motif) ).

But please don't just try to copy and paste this solution without at least reading the documentation. plyr is a very powerful package and it will be very helpful for you to understand it.



Old post answering the wrong question:

Are you looking for disjoint or overlapping pairs?

Here's one solution using the function rollapply from package zoo:

library(zoo)

motif_pairs <- rollapply(dat$motif, 2, c)              # get a matrix of pairs
motif_pairs <- apply(motif_pairs, 1, function(row) {   # for every row...
  paste0(sort(row), collapse = " ")                    #   sort the row, and concatenate it to a single string
                                                       #   (sorting ensures that pairs are not double-counted)
})
table(motif_pairs)                                     # since each pair is now represented by a unique string, just tabulate the string appearances

## if you want disjoint pairs, do `rollapply(dat$motif, 2, c, by = 2)` instead


Take a look at the docs for rollapply if this isn't quite what you need. For grouping by other variables, you can combine this with one of:


base R functions aggregate or by (not recommended), or
the *ply functions from plyr (better)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复