User defined function for reading and manipulating frequently used excel files into R

假装没事ソ 提交于 2020-01-16 08:38:23

问题


It feels like this should be a simple problem but for whatever reason I am not getting it.

I've got a bunch of tables I have to get at from SQL and other places and the code gets long. It's cumbersome and I'd just like to condense it by making functions() for each table so that when I call lets say load_table1 it runs the SQL code in R to load table1 with all the manipulations it needs. I've tried searching but functions are such a common theme that I can't quite find the right answer.

The goal is to have a function() that contains the code to that reads data using read_xlsx() or dbGetQuery(), manipulates the data, and then loads that table into R.

So, as an example if I have a table that looks like the data below, where there are errors in the YearOfManuf column, I need to scrub the errors and replace the values with an average for the YearOfManuf for the SerialNumber Prefix by each Make

data.table(Make = c("Toyota", "Toyota", "Toyota", "Toyota", "Toyota", "Toyota",
                "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi"),
           SerialNumber = c("ABC1", "ABC2", "ABC3", "ABC4", "ABC5", "ABC6", "ABC123", "ABC456", "ABC789", "ZYX123", "ZYX456", "ZYX789"),
           YearOfManuf = c(2017, "TEXT", 2010, 2019, 2070, 2019, 1999, 2000, 0, 1960, 2070, 2019))

So I would like to read the above table and manipulate it in the same section of code, like the below;

    example_table <-
data.table(Make = c("Toyota", "Toyota", "Toyota", "Toyota", "Toyota", "Toyota",
                    "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi", "Mitsubishi"),
           SerialNumber = c("ABC1", "ABC2", "ABC3", "ABC4", "ABC5", "ABC6", "ABC123", "ABC456", "ABC789", "ZYX123", "ZYX456", "ZYX789"),
           YearOfManuf = c(2017, "TEXT", 2010, 2019, 2070, 2019, 1999, 2000, 0, 1960, 2070, 2019)) %>% 
  mutate(OriginalYearCol = YearOfManuf,
         SerialPrefix = substr(SerialNumber, 0, 3),
         YearOfManuf = gsub("[^0-9.-]", "NA", YearOfManuf),
         YearOfManuf = as.double(case_when(is.na(YearOfManuf)==T ~ "NA",
                                           YearOfManuf > 2020 ~ "NA",
                                           YearOfManuf < 1990 ~ "NA",
                                 TRUE ~ YearOfManuf))) %>%
  group_by(Make, SerialPrefix) %>% 
  mutate(AverageMakeModelYear = round(mean(YearOfManuf, na.rm = TRUE), 0),
         YearOfManuf = case_when(is.na(YearOfManuf) == TRUE ~ AverageMakeModelYear,
                                 TRUE ~ YearOfManuf))

The ultimate goal is to write the above into a function() so that I can have a tidier, more easily referenced / searched code chunk to load the tables I need.

来源:https://stackoverflow.com/questions/59726326/user-defined-function-for-reading-and-manipulating-frequently-used-excel-files-i

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!