I\'m trying to write R code to read data from a mess of old spreadsheets. The exact location of the data varies from sheet to sheet: the only constant is that the first co
Here is how I would tackle it.
STEP 1
Read the excel spreadsheet in without
the headers.
STEP 2
Find the row index for your string Monthly return
in this case
STEP 3
Filter from the identified row (or column or both), prettify a little and done.
Here is what a sample function looks like. It works for your example no matter where it is in the spreadsheet. You can play around with regex
to make it more robust.
Function Definition:
library(xlsx)
extract_return <- function(path = getwd(), filename = "Mysheet.xlsx", sheetnum = 1){
filepath = paste(path, "/", filename, sep = "")
input = read.xlsx(filepath, sheetnum, header = FALSE)
start_idx = which(input == "Monthly return", arr.ind = TRUE)[1]
output = input[start_idx:dim(input)[1],]
rownames(output) <- NULL
colnames(output) <- c("Date","Monthly Return")
output = output[-1, ]
return(output)
}
Example:
final_df <- extract_return(
path = "~/Desktop",
filename = "Apr2017.xlsx",
sheetnum = 2)
No matter ho many rows or columns you may have, the idea remains the same.. Give it a try and let me know.