Create an empty data.frame

前端 未结 17 1077
猫巷女王i
猫巷女王i 2020-11-22 16:06

I\'m trying to initialize a data.frame without any rows. Basically, I want to specify the data types for each column and name them, but not have any rows created as a result

17条回答
  •  旧时难觅i
    2020-11-22 16:24

    If you want to declare such a data.frame with many columns, it'll probably be a pain to type all the column classes out by hand. Especially if you can make use of rep, this approach is easy and fast (about 15% faster than the other solution that can be generalized like this):

    If your desired column classes are in a vector colClasses, you can do the following:

    library(data.table)
    setnames(setDF(lapply(colClasses, function(x) eval(call(x)))), col.names)
    

    lapply will result in a list of desired length, each element of which is simply an empty typed vector like numeric() or integer().

    setDF converts this list by reference to a data.frame.

    setnames adds the desired names by reference.

    Speed comparison:

    classes <- c("character", "numeric", "factor",
                 "integer", "logical","raw", "complex")
    
    NN <- 300
    colClasses <- sample(classes, NN, replace = TRUE)
    col.names <- paste0("V", 1:NN)
    
    setDF(lapply(colClasses, function(x) eval(call(x))))
    
    library(microbenchmark)
    microbenchmark(times = 1000,
                   read = read.table(text = "", colClasses = colClasses,
                                     col.names = col.names),
                   DT = setnames(setDF(lapply(colClasses, function(x)
                     eval(call(x)))), col.names))
    # Unit: milliseconds
    #  expr      min       lq     mean   median       uq      max neval cld
    #  read 2.598226 2.707445 3.247340 2.747835 2.800134 22.46545  1000   b
    #    DT 2.257448 2.357754 2.895453 2.401408 2.453778 17.20883  1000  a 
    

    It's also faster than using structure in a similar way:

    microbenchmark(times = 1000,
                   DT = setnames(setDF(lapply(colClasses, function(x)
                     eval(call(x)))), col.names),
                   struct = eval(parse(text=paste0(
                     "structure(list(", 
                     paste(paste0(col.names, "=", 
                                  colClasses, "()"), collapse = ","),
                     "), class = \"data.frame\")"))))
    #Unit: milliseconds
    #   expr      min       lq     mean   median       uq       max neval cld
    #     DT 2.068121 2.167180 2.821868 2.211214 2.268569 143.70901  1000  a 
    # struct 2.613944 2.723053 3.177748 2.767746 2.831422  21.44862  1000   b
    

提交回复
热议问题