Reading Excel in R: how to find the start cell in messy spreadsheets

后端 未结 7 1754
暗喜
暗喜 2020-12-28 10:17

I\'m trying to write R code to read data from a mess of old spreadsheets. The exact location of the data varies from sheet to sheet: the only constant is that the first co

7条回答
  •  半阙折子戏
    2020-12-28 10:39

    This is a tidy alternative that avoids the multiple reads issue discussed above. However, when doing benchmarks, Rafael Zayas's answer still wins out.

    library("tidyxl")
    library("unpivotr")
    library("tidyr")
    library("dplyr")
    
    tidy_solution <- function() {
        raw <- xlsx_cells("messyExcel.xlsx")
    
        start <- raw %>%
            filter_all(any_vars(. %in% c("Monthly return"))) %>%
            select(row, col)
    
        month.col <- raw %>%
            filter(row >= start$row + 1, col == start$col - 1) %>%
            pivot_wider(date, col)
    
        return.col <- raw %>%
            filter(row >= start$row + 1, col == start$col) %>%
            pivot_wider(numeric, col)
     
        output <- cbind(month.col, return.col)
     }
    
    # My Solution
                expr     min       lq     mean   median      uq     max neval
    tidy_solution() 29.0372 30.40305 32.13793 31.36925 32.9812 56.6455   100
    
    # Rafael's
                    expr     min      lq     mean   median       uq     max neval
    original_solution() 21.4405 23.8009 25.86874 25.10865 26.99945 59.4128   100
    

提交回复
热议问题