How to randomize (or permute) a dataframe rowwise and columnwise?

前端未结

关注

 8  995

I have a dataframe (df1) like this.

     f1   f2   f3   f4   f5
d1   1    0    1    1    1  
d2   1    0    0    1    0
d3   0    0    0    1    1
d4   0


                      
              相关标签:


      
      
        
          8条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  渐次进展        
                
              
                            
                2020-11-28 23:27
              
            
            
                                                                       
Given the R data.frame:

> df1
  a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0


Shuffle row-wise:

> df2 <- df1[sample(nrow(df1)),]
> df2
  a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0


By default sample() randomly reorders the elements passed as the first argument.  This means that the default size is the size of the passed array.  Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.

Shuffle column-wise:

> df3 <- df1[,sample(ncol(df1))]
> df3
  c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2020-11-28 23:28
              
            
            
                                                                       
Take a look at permatswap() in the vegan package. Here is an example maintaining both row and column totals, but you can relax that and fix only one of the row or column sums.

mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
set.seed(4)
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")


This gives:

R> out$perm[[1]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    1    1    1
[2,]    0    1    0    1    0
[3,]    0    0    0    1    1
[4,]    1    0    0    0    1
R> out$perm[[2]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    0    1    1
[2,]    0    0    0    1    1
[3,]    1    0    0    1    0
[4,]    0    0    1    0    1


To explain the call:

out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")



times is the number of randomised matrices you want, here 99
burnin is the number of swaps made before we start taking random samples. This allows the matrix from which we sample to be quite random before we start taking each of our randomised matrices
thin says only take a random draw every thin swaps
mtype = "prab" says treat the matrix as presence/absence, i.e. binary 0/1 data.


A couple of things to note, this doesn't guarantee that any column or row has been randomised, but if burnin is long enough there should be a good chance of that having happened. Also, you could draw more random matrices than you need and discard ones that don't match all your requirements.

Your requirement to have different numbers of changes per row, also isn't covered here. Again you could sample more matrices than you want and then discard the ones that don't meet this requirement also.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2020-11-28 23:31
              
            
            
                                                                       
You can also "sample" the same number of items in your data frame with something like this:

nr<-dim(M)[1]
random_M = M[sample.int(nr),]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2020-11-28 23:32
              
            
            
                                                                       
If the goal is to randomly shuffle each column, some of the above answers don't work since the columns are shuffled jointly (this preserves inter-column correlations).  Others require installing a package.  Yet a one-liner exist:

df2 = lapply(df1, function(x) { sample(x) })

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  闹比i        
                
              
                            
                2020-11-28 23:34
              
            
            
                                                                       
This is another way to shuffle the data.frame using package dplyr:

row-wise:

df2 <- slice(df1, sample(1:n()))


or

df2 <- sample_frac(df1, 1L)


column-wise:

df2 <- select(df1, one_of(sample(names(df1)))) 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2020-11-28 23:35
              
            
            
                                                                       
Of course you can sample each row:

sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))


will shuffle the rows itself, so the number of 1's in each row doesn't change. Small changes and it also works great with columns, but this is a exercise for the reader :-P
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复