This is my sample dataset:
vector1 <-
data.frame(
\"name\" = \"a\",
\"age\" = 10,
\"fruit\" = c(\"orange\", \"cherry\", \"app
The OP has requested to complete each data.frame in list
so that all combinations of default
fruit and tags 1:2
will appear in the result whereby count
should be set to 0
for the additional rows. Finally, each data.frame should consist at least of 4 x 2 = 8 rows.
I want to propose two different approaches:
lapply()
and the CJ()
(cross join) function from data.table
to return a list.list
to one large data.table using rbindlist()
and apply the required transformations on the whole data.table.lapply()
and CJ()
library(data.table)
lapply(lst, function(x) setDT(x)[
CJ(name = name, age = age, fruit = default, tag = 1:2, unique = TRUE),
on = .(name, age, fruit, tag)][
is.na(count), count := 0][order(-count, tag)]
)
[[1]] name age fruit count tag 1: a 10 cherry 1 1 2: a 10 orange 1 1 3: a 10 apple 1 2 4: a 10 apple 0 1 5: a 10 mango 0 1 6: a 10 cherry 0 2 7: a 10 mango 0 2 8: a 10 orange 0 2 [[2]] name age fruit count tag 1: b 33 apple 1 2 2: b 33 mango 1 2 3: b 33 apple 0 1 4: b 33 cherry 0 1 5: b 33 mango 0 1 6: b 33 orange 0 1 7: b 33 cherry 0 2 8: b 33 orange 0 2 [[3]] name age fruit count tag 1: c 58 apple 1 1 2: c 58 cherry 1 1 3: c 58 mango 0 1 4: c 58 orange 0 1 5: c 58 apple 0 2 6: c 58 cherry 0 2 7: c 58 mango 0 2 8: c 58 orange 0 2
Ordering by count
and tag
is not required but helps to compare the result with OP's expected output.
Instead of a list of data.frames with identical structure we can use one large data.table where the origin of each row can be identified by an id column.
Indeed, th OP has asked other questions ("using lapply function and list in r"
and "how to loop the dataframe using sqldf?" where he asked for help in handling a list of data.frames. G. Grothendieck already had suggested to rbind
the rows together.
The rbindlist()
function has the idcol
parameter which identifies the origin of each row:
library(data.table)
rbindlist(list, idcol = "df")
df name age fruit count tag 1: 1 a 10 orange 1 1 2: 1 a 10 cherry 1 1 3: 1 a 10 apple 1 2 4: 2 b 33 apple 1 2 5: 2 b 33 mango 1 2 6: 3 c 58 cherry 1 1 7: 3 c 58 apple 1 1
Note that df
contains the number of the source data.frame in list
(or the names of the list elements if list
is named).
Now, we can apply above solution by grouping over df
:
rbindlist(list, idcol = "df")[, .SD[
CJ(name = name, age = age, fruit = default, tag = 1:2, unique = TRUE),
on = .(name, age, fruit, tag)], by = df][
is.na(count), count := 0][order(df, -count, tag)]
df name age fruit count tag 1: 1 a 10 cherry 1 1 2: 1 a 10 orange 1 1 3: 1 a 10 apple 1 2 4: 1 a 10 apple 0 1 5: 1 a 10 mango 0 1 6: 1 a 10 cherry 0 2 7: 1 a 10 mango 0 2 8: 1 a 10 orange 0 2 9: 2 b 33 apple 1 2 10: 2 b 33 mango 1 2 11: 2 b 33 apple 0 1 12: 2 b 33 cherry 0 1 13: 2 b 33 mango 0 1 14: 2 b 33 orange 0 1 15: 2 b 33 cherry 0 2 16: 2 b 33 orange 0 2 17: 3 c 58 apple 1 1 18: 3 c 58 cherry 1 1 19: 3 c 58 mango 0 1 20: 3 c 58 orange 0 1 21: 3 c 58 apple 0 2 22: 3 c 58 cherry 0 2 23: 3 c 58 mango 0 2 24: 3 c 58 orange 0 2 df name age fruit count tag