Read multiple xlsx files with multiple sheets into one R data frame

前端 未结 4 685
轮回少年
轮回少年 2020-12-09 00:26

I have been reading up on how to read and combine multiple xlsx files into one R data frame and have come across some very good suggestions like, How to read multiple xlsx f

相关标签:
4条回答
  • 2020-12-09 00:40

    openxlsx solution:

    filename <-"myFilePath"
    
    sheets <- openxlsx::getSheetNames(filename)
    SheetList <- lapply(sheets,openxlsx::read.xlsx,xlsxFile=filename)
    names(SheetList) <- sheets
    
    0 讨论(0)
  • 2020-12-09 00:53

    Here's a tidyverse and readxl driven option that returns a data frame with columns for file and sheet names for each file.

    In this example, not every file has the same sheets or columns; test2.xlsx has only one sheet and test3.xlsx sheet1 does not have col3.

    library(tidyverse)
    library(readxl)
    
    dir_path <- "~/test_dir/"         # target directory path where the xlsx files are located. 
    re_file <- "^test[0-9]\\.xlsx"    # regex pattern to match the file name format, in this case 'test1.xlsx', 'test2.xlsx' etc, but could simply be 'xlsx'.
    
    read_sheets <- function(dir_path, file){
      xlsx_file <- paste0(dir_path, file)
      xlsx_file %>%
        excel_sheets() %>%
        set_names() %>%
        map_df(read_excel, path = xlsx_file, .id = 'sheet_name') %>% 
        mutate(file_name = file) %>% 
        select(file_name, sheet_name, everything())
    }
    
    df <- list.files(dir_path, re_file) %>% 
      map_df(~ read_sheets(dir_path, .))
    
    # A tibble: 15 x 5
       file_name  sheet_name  col1  col2  col3
       <chr>      <chr>      <dbl> <dbl> <dbl>
     1 test1.xlsx Sheet1         1     2     4
     2 test1.xlsx Sheet1         3     2     3
     3 test1.xlsx Sheet1         2     4     4
     4 test1.xlsx Sheet2         3     3     1
     5 test1.xlsx Sheet2         2     2     2
     6 test1.xlsx Sheet2         4     3     4
     7 test2.xlsx Sheet1         1     3     5
     8 test2.xlsx Sheet1         4     4     3
     9 test2.xlsx Sheet1         1     2     2
    10 test3.xlsx Sheet1         3     9    NA
    11 test3.xlsx Sheet1         4     7    NA
    12 test3.xlsx Sheet1         5     3    NA
    13 test3.xlsx Sheet2         1     3     4
    14 test3.xlsx Sheet2         2     5     9
    15 test3.xlsx Sheet2         4     3     1
    
    0 讨论(0)
  • 2020-12-09 00:59

    One more solution from this "rio" package :

    library("rio")
    
    # import and rbind all worksheets
    DT <- import_list(SINGLE_XLSX_PATH, rbind = TRUE)
    

    source : rdrr.io

    0 讨论(0)
  • 2020-12-09 01:01

    I would use a nested loop like this to go through each sheet of each file. It might not be the fastest solution but it is the simplest.

    require(xlsx)    
    file.list <- list.files(recursive=T,pattern='*.xlsx')  #get files list from folder
    
    for (i in 1:length(files.list)){                                           
      wb <- loadWorkbook(files.list[i])           #select a file & load workbook
      sheet <- getSheets(wb)                      #get sheet list
    
      for (j in 1:length(sheet)){ 
        tmp<-read.xlsx(files.list[i], sheetIndex=j, colIndex= c(1:6,8:10,12:17,19),
                       sheetName=NULL, startRow=4, endRow=NULL,
                       as.data.frame=TRUE, header=F)   
        if (i==1&j==1) dataset<-tmp else dataset<-rbind(dataset,tmp)   #happend to previous
    
      }
    }
    

    You can clean NA values after the loading phase.

    0 讨论(0)
提交回复
热议问题