问题
I have an example dataset, garden
, as shown below. The real thing is thousands of rows. I also have an example list. productFruit
. I want to know the calories
of every fruit
, considering the usage
reported in garden
. I basically want to loop through all the rows in my table, check if the usage is recorded
in the productFruit
list and the return either the calories
or one of the following error messages:
- "usage out of scope" if no
usage
has been found in theproductFruit
list - "fruit out of scope" if no
usage
has been found in theproductFruit
list - "erroneous data" if data is missing
garden
:
fruit = c("Apple", "Kiwi", "Banana", "Orange", "Blueberry")
usage = c("cooking", "cooking", "NA", "drinking", "medicine")
reported = c(200, 500, 77, 520, 303)
garden <- cbind(fruit, usage, reported)
garden <- as.data.table(garden)
productFruit
:
productFruit <- list(Basket = c('DUH'),
type = list (
Apple = list(ID = 1,
color = "poor",
usage = list(eating = list(ID = 1,
quality = "good",
calories = 500),
medicine = list(ID = 2,
quality = "poor",
calories = 300))),
Orange = list(ID = c(1,2,3),
color = c(3,4,5),
usage = list(eating = list(ID = 1,
quality = "poor",
calories = 420),
cooking = list(ID = 2,
quality = "questionable",
calories = 600),
drinking = list(ID = 3,
quality = "good",
calories = 800),
medicine = list(ID = 4,
quality = "good",
calories = 0))),
Banana = list(ID = c(1,2,3),
color = c(3,4,5),
usage = list(cooking = list(ID = 1,
quality = "good",
calories = 49),
drinking = list(ID = 2,
quality = "questionable",
calories = 11),
medicine = list(ID = 3,
quality = "poor",
calories = 55)))))
I tried to break it down into smaller steps and doing this with loops, but i have very little experience with lists
and was getting many errors. Any ideas how to solve this in an efficient & readable way? Below one of my many attempts to just match the fruits
. I am aware that the field do not match, i was just trying to get the loop to run at all...
for (i in seq_len(nrow(garden))){
if (garden$fruit[i] == productFruit$type){
garden$calories = productFruit$type[[i]]$ID
}
garden$calories = "error"
}
The desired output is this:
fruit = c("Apple", "Kiwi", "Banana", "Orange", "Blueberry")
usage = c("cooking", "cooking", "NA", "drinking", "medicine")
reported = c(200, 500, 77, 520, 303)
calories = c("usage out of scope", "fruit out of scope", "erroneous data", 800, "fruit out of scope")
garden_with_calories <- cbind(fruit, usage, reported, calories)
garden_with_calories <- as.data.table(garden)
回答1:
Update
For large dataset, for
loop is not recommended. Following codes are the alternative
Step 1 check if fruit exist on product list
fruitExist <- fruit %in% names(productFruit$type)
Step 2 for every fruit, check if corresponding usage is exist on product list
usageExist <- sapply(fruit, function(f){
sapply(usage, `%in%`, x = names(productFruit$type[[f]][["usage"]]))})
usageExist <- as.data.frame(unique(sapply(usageExist[sapply(usageExist, is.logical)], colSums)))
usageExist$usage <- row.names(usageExist)
Step 3 extract calories
calories <- data.frame(unique(
sapply(fruit, function(f){
sapply(usage, function(u){productFruit$type[[f]][["usage"]][[u]][["calories"]]})}
)))
calories <- unlist(as.data.frame(unique(
sapply(fruit, function(f){
sapply(usage, function(u){productFruit$type[[f]][["usage"]][[u]][["calories"]]})}
))))
calories <- as.data.frame(calories)
names(calories) <- "cal"
calories$fruitUsage <- row.names(calories)
Step 4 combine and finalize
library(tidyverse)
garden %>%
mutate(fruitExist = fruitExist) %>%
left_join(usageExist %>% pivot_longer(-usage, names_to = "fruit", values_to = "usageExist")) %>%
left_join(calories %>% separate(fruitUsage, c("fruit","usage"))) %>%
mutate(calories = case_when(
fruit == "NA" | usage == "NA" ~ "erroneous data",
usageExist == FALSE ~ "usage out of scope",
fruitExist == FALSE ~ "fruit out of scope",
TRUE ~ as.character(cal))) %>%
select(fruit, usage, reported, calories)
Output
garden
# fruit usage reported calories
# 1 Apple cooking 200 usage out of scope
# 2 Kiwi cooking 500 fruit out of scope
# 3 Banana NA 77 erroneous data
# 4 Orange drinking 520 800
# 5 Blueberry medicine 303 fruit out of scope
Previus codes
Try this:
cal <- as.character()
for(i in 1:length(fruit)){
fruitName <- fruit[i]
usageName <- usage[i]
if(fruitName == "NA" | usageName == "NA") {
out <- "erroneous data"
} else if(!(fruitName %in% names(productFruit[["type"]]))){
out <- "fruit out of scope"
} else if(!(usageName %in% names(productFruit[["type"]][[fruitName]][["usage"]]))){
out <- "usage out of scope"
} else {
out <- productFruit[["type"]][[fruitName]][["usage"]][[usageName]][["calories"]]
}
cal <- c(cal, out)
}
garden$calories <- cal
garden
# fruit usage reported calories
# 1: Apple cooking 200 usage out of scope
# 2: Kiwi cooking 500 fruit out of scope
# 3: Banana NA 77 erroneous data
# 4: Orange drinking 520 800
# 5: Blueberry medicine 303 fruit out of scope
回答2:
I made this code in Base R
that finds and reports only the fruits and their respective usage that are actually present. I know its not exactly what you asked for but by the time I realised that it was a bit too late. Its a very different approach to other proposed solutions.
FruitNames <- unlist(lapply(productFruit,names)[2])
UsageByFruit <- lapply(FruitNames, function(X) names(productFruit[["type"]][[X]][["usage"]]))
LengthByFruit<- lapply(UsageByFruit, length)
gardenlength <- sum(unlist(LengthByFruit))
garden <- data.frame(matrix(ncol=3,nrow=gardenlength, dimnames=list(NULL, c("Fruit", "Usage", "Calories"))))
garden[,2] <- unlist(UsageByFruit)
garden[,1] <- unlist(lapply(1:length(FruitNames), function(X) replicate(LengthByFruit[[X]],FruitNames[X])))
garden[,3] <- unlist(lapply(1:length(FruitNames), function(X) unlist(lapply(unlist(UsageByFruit[X]), function(Y) productFruit[["type"]][[FruitNames[X]]][["usage"]][[Y]][["calories"]] ))))
Output:
> garden
Fruit Usage Calories
1 Apple eating 500
2 Apple medicine 300
3 Orange eating 420
4 Orange cooking 600
5 Orange drinking 800
6 Orange medicine 0
7 Banana cooking 49
8 Banana drinking 11
9 Banana medicine 55
回答3:
Extracting data from nested lists can be very tedious. Here is some code that works for the example you provided, but might still struggle, in case you have entries that vary from the example data. You'll probably have to make it more robust and check that the data has the class
you expect it to be etc.
library(tidyverse)
Step 1:
We create a some code that extracts one fruit at a time:
# this creates a tibble with a column for each usage entry (eating, drinking,
# etc.)
type_df <- as.tibble(productFruit$type[[1]]$usage)
# With map*() we apply as.tibble() to each column to get a one-row data frame
# per "usage" case. We use map_dfr() in order to bind togeter the resulting
# rows into one dataframe. This is the line that might need to be made more
# robust in order to not fail on unexpected input.)
res <- map_dfr(type_df, as.tibble, .id = "usage")
# When there is no usage entry, `res` will be empty and we create a dummy
# dataframe for that case that has `NA` for the "colories" column.
if (nrow(res) < 1)
tibble(calories = NA)
else
res
Step 2:
Now we put the previous lines into a function, so we can apply it to all fruits.
extract_fruit_data <-
function(fruit) {
type_df <- as.tibble(fruit$usage)
res <- map_dfr(type_df, as.tibble, .id = "usage")
if (nrow(res) < 1)
tibble(calories = NA)
else
res
}
Step 3:
We apply extract_fruit_data
to each fruit's entry and bind togther the
resulting rows using map_dfr()
. Then we drop and rename some of the variables,
in preparation for the next step.
fruits_df <-
map_dfr(productFruit$type, extract_fruit_data, .id = "type") %>%
select(-ID, -quality) %>%
rename(fruit = type)
Step 4:
We join the two datasets with left_join()
that way each entry in garden,
is kept and those entries that are not matched in fruits_df gets an NA
in the calories column. With case_when()
we classify each column,
according to your specifications
left_join(garden, fruits_df) %>%
mutate(calories = case_when(
usage == "NA" ~ "erroneous data",
!fruit %in% fruits_df$fruit ~ "fruit out of scope",
is.na(calories) ~ "usage out of scope",
TRUE ~ as.character(calories)
))
来源:https://stackoverflow.com/questions/60873305/nested-loops-through-a-structured-list-in-r