Can an array be grouped more efficiently than sorted?

前端 未结 4 498
青春惊慌失措
青春惊慌失措 2020-12-06 06:44

While working on example code for an algorithm question, I came across the situation where I was sorting an input array, even though I only needed to have identical elements

4条回答
  •  無奈伤痛
    2020-12-06 07:46

    How about using a 2-dimensional array with the 1st dimension being the frequency of each value, and the second dimension is the value itself. We can take advantage of the Boolean data type and indexing. This also allows us to sort the original array instantly while looping over the original array exactly one time giving us an O(n) solution. I'm thinking that this approach will translate well to other languages. Observe the following base R code (N.B. there are far more efficient ways in R than the below, I'm simply giving a more general approach).

    GroupArray <- function(arr.in) {
    
        maxVal <- max(arr.in)
    
        arr.out.val <- rep(FALSE, maxVal)  ## F, F, F, F, ...
        arr.out.freq <- rep(0L, maxVal)     ## 0, 0, 0, 0, ... 
    
        for (i in arr.in) {
            arr.out.freq[i] <- arr.out.freq[i]+1L
            arr.out.val[i] <- TRUE
        }
    
        myvals <- which(arr.out.val)   ## "which" returns the TRUE indices
    
        array(c(arr.out.freq[myvals],myvals), dim = c(length(myvals), 2), dimnames = list(NULL,c("freq","vals")))
    }
    

    Small example of the above code:

    set.seed(11)
    arr1 <- sample(10, 10, replace = TRUE)
    
    arr1                                    
    [1]  3  1  6  1  1 10  1  3  9  2     ## unsorted array
    
    GroupArray(arr1)    
         freq vals       ## Nicely sorted with the frequency
    [1,]    4    1
    [2,]    1    2
    [3,]    2    3
    [4,]    1    6
    [5,]    1    9
    [6,]    1   10
    

    Larger example:

    set.seed(101)
    arr2 <- sample(10^6, 10^6, replace = TRUE)
    
    arr2[1:10]       ## First 10 elements of random unsorted array
    [1] 372199  43825 709685 657691 249856 300055 584867 333468 622012 545829
    
    arr2[999990:10^6]     ## Last 10 elements of random unsorted array
    [1] 999555 468102 851922 244806 192171 188883 821262 603864  63230  29893 664059
    
    t2 <- GroupArray(arr2)
    head(t2)
         freq vals        ## Nicely sorted with the frequency
    [1,]    2    1
    [2,]    2    2
    [3,]    2    3
    [4,]    2    6
    [5,]    2    8
    [6,]    1    9
    
    tail(t2)
              freq    vals 
    [632188,]    3  999989
    [632189,]    1  999991
    [632190,]    1  999994
    [632191,]    2  999997
    [632192,]    2  999999
    [632193,]    2 1000000
    

提交回复
热议问题