Is there an equivalent R function to Stata 'order' command?

后端 未结 6 1231
旧时难觅i
旧时难觅i 2021-01-13 11:09

\'order\' in R seems like \'sort\' in Stata. Here\'s a dataset for example (only variable names listed):

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v

6条回答
  •  梦毁少年i
    2021-01-13 11:56

    You could write your own function that does this.

    The following will give you the new order for your column names using similar syntax to stata

    • where is a named list with 4 possibilities

      • list(last = T)
      • list(first = T)
      • list(before = x) where x is the variable name in question
      • list(after = x) where x is the variable name in question
    • sorted = T will sort var_list lexicographically (a combination of alphabetic and sequential from the stata command

    The function works on the names only, (once you pass a data.frame object as data, and returns a reordered list of names

    eg

    stata.order <- function(var_list, where, sorted = F, data) {
        all_names = names(data)
        # are all the variable names in
        check <- var_list %in% all_names
        if (any(!check)) {
            stop("Not all variables in var_list exist within  data")
        }
        if (names(where) == "before") {
            if (!(where %in% all_names)) {
                stop("before variable not in the data set")
            }
        }
        if (names(where) == "after") {
            if (!(where %in% all_names)) {
                stop("after variable not in the data set")
            }
        }
    
        if (sorted) {
            var_list <- sort(var_list)
        }
        where_in <- which(all_names %in% var_list)
        full_list <- seq_along(data)
        others <- full_list[-c(where_in)]
    
        .nwhere <- names(where)
        if (!(.nwhere %in% c("last", "first", "before", "after"))) {
            stop("where must be a list of a named element first, last, before or after")
        }
    
        do_what <- switch(names(where), last = length(others), first = 0, before = which(all_names[others] == 
            where) - 1, after = which(all_names[others] == where))
    
        new_order <- append(others, where_in, do_what)
        return(all_names[new_order])
    }
    
    tmp <- as.data.frame(matrix(1:100, ncol = 10))
    
    stata.order(var_list = c("V2", "V5"), where = list(last = T), data = tmp)
    
    ##  [1] "V1"  "V3"  "V4"  "V6"  "V7"  "V8"  "V9"  "V10" "V2"  "V5" 
    
    stata.order(var_list = c("V2", "V5"), where = list(first = T), data = tmp)
    
    ##  [1] "V2"  "V5"  "V1"  "V3"  "V4"  "V6"  "V7"  "V8"  "V9"  "V10"
    
    stata.order(var_list = c("V2", "V5"), where = list(before = "V6"), data = tmp)
    
    ##  [1] "V1"  "V3"  "V4"  "V2"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10"
    
    stata.order(var_list = c("V2", "V5"), where = list(after = "V4"), data = tmp)
    
    ##  [1] "V1"  "V3"  "V4"  "V2"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10"
    
    # throws an error
    stata.order(var_list = c("V2", "V5"), where = list(before = "v11"), data = tmp)
    
    ## Error: before variable not in the data set
    

    if you want to do the reordering memory-efficiently (by reference, without copying) use data.table

    DT <- data.table(tmp)
    # sets by reference, no copying
    setcolorder(DT, stata.order(var_list = c("V2", "V5"), where = list(after = "V4"), 
        data = DT))
    
    DT
    
    ##     V1 V3 V4 V2 V5 V6 V7 V8 V9 V10
    ##  1:  1 21 31 11 41 51 61 71 81  91
    ##  2:  2 22 32 12 42 52 62 72 82  92
    ##  3:  3 23 33 13 43 53 63 73 83  93
    ##  4:  4 24 34 14 44 54 64 74 84  94
    ##  5:  5 25 35 15 45 55 65 75 85  95
    ##  6:  6 26 36 16 46 56 66 76 86  96
    ##  7:  7 27 37 17 47 57 67 77 87  97
    ##  8:  8 28 38 18 48 58 68 78 88  98
    ##  9:  9 29 39 19 49 59 69 79 89  99
    ## 10: 10 30 40 20 50 60 70 80 90 100
    

提交回复
热议问题