问题
It feels like this should be a simple problem but for whatever reason I am not getting it.
I've got a bunch of tables I have to get at from SQL and other places and the code gets long. It's cumbersome and I'd just like to condense it by making functions() for each table so that when I call lets say load_table1 it runs the SQL code in R to load table1 with all the manipulations it needs. I've tried searching but functions are such a common theme that I can't quite find the right answer.
The goal is to have a function() that contains the code to that reads data using read_xlsx() or dbGetQuery(), manipulates the data, and then loads that table into R.
So, as an example if I have a table that looks like the data below, where there are errors in the YearOfManuf column, I need to scrub the errors and replace the values with an average for the YearOfManuf for the SerialNumber Prefix by each Make
data.table(Make = c("Toyota", "Toyota", "Toyota", "Toyota", "Toyota", "Toyota",
"Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi"),
SerialNumber = c("ABC1", "ABC2", "ABC3", "ABC4", "ABC5", "ABC6", "ABC123", "ABC456", "ABC789", "ZYX123", "ZYX456", "ZYX789"),
YearOfManuf = c(2017, "TEXT", 2010, 2019, 2070, 2019, 1999, 2000, 0, 1960, 2070, 2019))
So I would like to read the above table and manipulate it in the same section of code, like the below;
example_table <-
data.table(Make = c("Toyota", "Toyota", "Toyota", "Toyota", "Toyota", "Toyota",
"Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi"),
SerialNumber = c("ABC1", "ABC2", "ABC3", "ABC4", "ABC5", "ABC6", "ABC123", "ABC456", "ABC789", "ZYX123", "ZYX456", "ZYX789"),
YearOfManuf = c(2017, "TEXT", 2010, 2019, 2070, 2019, 1999, 2000, 0, 1960, 2070, 2019)) %>%
mutate(OriginalYearCol = YearOfManuf,
SerialPrefix = substr(SerialNumber, 0, 3),
YearOfManuf = gsub("[^0-9.-]", "NA", YearOfManuf),
YearOfManuf = as.double(case_when(is.na(YearOfManuf)==T ~ "NA",
YearOfManuf > 2020 ~ "NA",
YearOfManuf < 1990 ~ "NA",
TRUE ~ YearOfManuf))) %>%
group_by(Make, SerialPrefix) %>%
mutate(AverageMakeModelYear = round(mean(YearOfManuf, na.rm = TRUE), 0),
YearOfManuf = case_when(is.na(YearOfManuf) == TRUE ~ AverageMakeModelYear,
TRUE ~ YearOfManuf))
The ultimate goal is to write the above into a function() so that I can have a tidier, more easily referenced / searched code chunk to load the tables I need.
来源:https://stackoverflow.com/questions/59726326/user-defined-function-for-reading-and-manipulating-frequently-used-excel-files-i