Importing many files at the same time and adding ID indicator

早过忘川 提交于 2019-11-29 08:48:36

You could also check out purrr::map_df which behaves like lapply or for loop but returns a data.frame

read_traj <- function(fi) {
    df <- read.table(fi, header=F, skip=23)
    df <- df[, c(1:4, 15)]
    colnames(df) <- c("t", "x", "y", "z", "Etot")
    return(df)
}

files.list <- list.files(pattern = ".log")
library(tidyverse)

map_df has a handy feature .id=... that creates a column, id, with numbers 1...N where N is number of files.

map_df(files.list, ~read_traj(.x), .id="id")

If you want to save the file name instead, use the id column to access files.list

map_df(files.list, ~read_traj(.x), .id="id") %>%
  mutate(id = files.list[as.numeric(id)])

First of all, you should encapsulate the reading part in a function :

read_log_file <- function(path) {
  trjct <- read.table(path, skip = 23)
  trjct <- trjct[,c("V1","V2","V3", "V4", "V15")]
  colnames(trjct) <- c("t", "x", "y", "z", "Etot")
  return(trjct)
}

Then, you can create a list of data.frame using mapply (kind of apply which can take two parameters, go to datacamp article on apply family if you want to know more).

files.list <- list.files(pattern = ".log")
ids <- 1:length(files.list)

df_list = mapply(function(path, id) {
    df = read_log_file(path)
    df$ID = id
    return(df)
}, files.list, ids, SIMPLIFY=FALSE)

Note the SIMPLIFY=FALSE part, it avoids mapply to return a kind of data.frame and return a raw list of data.frame instead.

Finally, you can concatenate all your data.frame in one with bind_rows from dplyr package :

df = dplyr::bind_rows(df_list)

Note : in general, in R, it's better to use *apply functions family.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!