how to create categories conditionally using other variables values and sequence

问题

I would appreciate any help to create a function that allows me to create categories of one variable using the order of a set of other variables values.

Specifically, I want a function that:

creates category E1 of the variable variable the first time that each combination of values of the variables A, B, and ID appears in the dataset.

creates category E2 of the variable variable the second time that each combination of values of the variables A, B, and ID appears in the dataset.

creates category E3 of the variable variable the third time that each combination of values of the variables A, B, and ID appears in the dataset.

creates category En of the variable variable the nth time that each combination of values of the variables A, B, and ID appears in the dataset.

#sample data:

rowdT<-structure(list(A = c("a1", "a2", "a1", "a1", "a2", "a1", "a1", 
            "a2", "a1"), B = c("b2", "b2", "b2", "b1", "b2", "b2", "b1", 
            "b2", "b1"), ID = c("3", "4", "3", "1", "4", "3", "1", "4", "1"
            ), E = c(0.621142094943352, 0.742109450696123, 0.39439152996948, 
            0.40694392882818, 0.779607277916503, 0.550579323666347, 0.352622183880119, 
            0.690660491345867, 0.23378944873769)), class = c("data.table", 
            "data.frame"), row.names = c(NA, -9L))     
sampleDT <- melt(rowdT, id.vars = c("A", "B", "ID"))

#input data:

    A  B  ID variable    value
1: a1 b2  3        E 0.6211421
2: a2 b2  4        E 0.7421095
3: a1 b2  3        E 0.3943915
4: a1 b1  1        E 0.4069439
5: a2 b2  4        E 0.7796073
6: a1 b2  3        E 0.5505793
7: a1 b1  1        E 0.3526222
8: a2 b2  4        E 0.6906605
9: a1 b1  1        E 0.2337894

#expected output:

    A  B  ID variable    value
4: a1 b1  1        E1 0.4069439
1: a1 b2  3        E1 0.6211421
2: a2 b2  4        E1 0.7421095
7: a1 b1  1        E2 0.3526222
3: a1 b2  3        E2 0.3943915
5: a2 b2  4        E2 0.7796073
9: a1 b1  1        E3 0.2337894
6: a1 b2  3        E3 0.5505793
8: a2 b2  4        E3 0.6906605

Thanks in advance for any help.

回答1:

First convert your variable to a character vector for proper coercion, and then use data.table

sampleDT$variable = as.character(sampleDT$variable)

sampleDT[, variable := paste(variable,1:.N,sep = ""), by = c("A", "B", "ID")]

This creates unique tallies based on the observed combinations of A, B, and ID.

This gets the following output:

    A  B ID variable     value
1: a1 b2  3       E1 0.6211421
2: a2 b2  4       E1 0.7421095
3: a1 b2  3       E2 0.3943915
4: a1 b1  1       E1 0.4069439
5: a2 b2  4       E2 0.7796073
6: a1 b2  3       E3 0.5505793
7: a1 b1  1       E2 0.3526222
8: a2 b2  4       E3 0.6906605
9: a1 b1  1       E3 0.2337894

which you can reorder if necessary.

来源：https://stackoverflow.com/questions/54488761/how-to-create-categories-conditionally-using-other-variables-values-and-sequence

标签

function

data.table

reshape

tidyr