Reading Excel in R: how to find the start cell in messy spreadsheets

后端未结

关注

 7  1754

暗喜 2020-12-28 10:17

I\'m trying to write R code to read data from a mess of old spreadsheets. The exact location of the data varies from sheet to sheet: the only constant is that the first co

7条回答

半阙折子戏 (楼主)

2020-12-28 10:39

This is a tidy alternative that avoids the multiple reads issue discussed above. However, when doing benchmarks, Rafael Zayas's answer still wins out.

library("tidyxl")
library("unpivotr")
library("tidyr")
library("dplyr")

tidy_solution <- function() {
    raw <- xlsx_cells("messyExcel.xlsx")

    start <- raw %>%
        filter_all(any_vars(. %in% c("Monthly return"))) %>%
        select(row, col)

    month.col <- raw %>%
        filter(row >= start$row + 1, col == start$col - 1) %>%
        pivot_wider(date, col)

    return.col <- raw %>%
        filter(row >= start$row + 1, col == start$col) %>%
        pivot_wider(numeric, col)
 
    output <- cbind(month.col, return.col)
 }

# My Solution
            expr     min       lq     mean   median      uq     max neval
tidy_solution() 29.0372 30.40305 32.13793 31.36925 32.9812 56.6455   100

# Rafael's
                expr     min      lq     mean   median       uq     max neval
original_solution() 21.4405 23.8009 25.86874 25.10865 26.99945 59.4128   100

0 讨论(0)

查看其它7个回答