How to randomize (or permute) a dataframe rowwise and columnwise?

前端 未结 8 995
Happy的楠姐
Happy的楠姐 2020-11-28 22:50

I have a dataframe (df1) like this.

     f1   f2   f3   f4   f5
d1   1    0    1    1    1  
d2   1    0    0    1    0
d3   0    0    0    1    1
d4   0             


        
相关标签:
8条回答
  • 2020-11-28 23:27

    Given the R data.frame:

    > df1
      a b c
    1 1 1 0
    2 1 0 0
    3 0 1 0
    4 0 0 0
    

    Shuffle row-wise:

    > df2 <- df1[sample(nrow(df1)),]
    > df2
      a b c
    3 0 1 0
    4 0 0 0
    2 1 0 0
    1 1 1 0
    

    By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.

    Shuffle column-wise:

    > df3 <- df1[,sample(ncol(df1))]
    > df3
      c a b
    1 0 1 1
    2 0 1 0
    3 0 0 1
    4 0 0 0
    
    0 讨论(0)
  • 2020-11-28 23:28

    Take a look at permatswap() in the vegan package. Here is an example maintaining both row and column totals, but you can relax that and fix only one of the row or column sums.

    mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
    set.seed(4)
    out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
    

    This gives:

    R> out$perm[[1]]
         [,1] [,2] [,3] [,4] [,5]
    [1,]    1    0    1    1    1
    [2,]    0    1    0    1    0
    [3,]    0    0    0    1    1
    [4,]    1    0    0    0    1
    R> out$perm[[2]]
         [,1] [,2] [,3] [,4] [,5]
    [1,]    1    1    0    1    1
    [2,]    0    0    0    1    1
    [3,]    1    0    0    1    0
    [4,]    0    0    1    0    1
    

    To explain the call:

    out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
    
    1. times is the number of randomised matrices you want, here 99
    2. burnin is the number of swaps made before we start taking random samples. This allows the matrix from which we sample to be quite random before we start taking each of our randomised matrices
    3. thin says only take a random draw every thin swaps
    4. mtype = "prab" says treat the matrix as presence/absence, i.e. binary 0/1 data.

    A couple of things to note, this doesn't guarantee that any column or row has been randomised, but if burnin is long enough there should be a good chance of that having happened. Also, you could draw more random matrices than you need and discard ones that don't match all your requirements.

    Your requirement to have different numbers of changes per row, also isn't covered here. Again you could sample more matrices than you want and then discard the ones that don't meet this requirement also.

    0 讨论(0)
  • 2020-11-28 23:31

    You can also "sample" the same number of items in your data frame with something like this:

    nr<-dim(M)[1]
    random_M = M[sample.int(nr),]
    
    0 讨论(0)
  • 2020-11-28 23:32

    If the goal is to randomly shuffle each column, some of the above answers don't work since the columns are shuffled jointly (this preserves inter-column correlations). Others require installing a package. Yet a one-liner exist:

    df2 = lapply(df1, function(x) { sample(x) })
    
    0 讨论(0)
  • 2020-11-28 23:34

    This is another way to shuffle the data.frame using package dplyr:

    row-wise:

    df2 <- slice(df1, sample(1:n()))
    

    or

    df2 <- sample_frac(df1, 1L)
    

    column-wise:

    df2 <- select(df1, one_of(sample(names(df1)))) 
    
    0 讨论(0)
  • 2020-11-28 23:35

    Of course you can sample each row:

    sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))
    

    will shuffle the rows itself, so the number of 1's in each row doesn't change. Small changes and it also works great with columns, but this is a exercise for the reader :-P

    0 讨论(0)
提交回复
热议问题