how to create categories conditionally using other variables values and sequence

孤街浪徒 提交于 2019-12-08 05:09:52

问题


I would appreciate any help to create a function that allows me to create categories of one variable using the order of a set of other variables values.

Specifically, I want a function that:

  1. creates category E1 of the variable variable the first time that each combination of values of the variables A, B, and ID appears in the dataset.
  2. creates category E2 of the variable variable the second time that each combination of values of the variables A, B, and ID appears in the dataset.
  3. creates category E3 of the variable variable the third time that each combination of values of the variables A, B, and ID appears in the dataset.
  4. creates category En of the variable variable the nth time that each combination of values of the variables A, B, and ID appears in the dataset.

#sample data:

rowdT<-structure(list(A = c("a1", "a2", "a1", "a1", "a2", "a1", "a1", 
            "a2", "a1"), B = c("b2", "b2", "b2", "b1", "b2", "b2", "b1", 
            "b2", "b1"), ID = c("3", "4", "3", "1", "4", "3", "1", "4", "1"
            ), E = c(0.621142094943352, 0.742109450696123, 0.39439152996948, 
            0.40694392882818, 0.779607277916503, 0.550579323666347, 0.352622183880119, 
            0.690660491345867, 0.23378944873769)), class = c("data.table", 
            "data.frame"), row.names = c(NA, -9L))     
sampleDT <- melt(rowdT, id.vars = c("A", "B", "ID"))

#input data:

    A  B  ID variable    value
1: a1 b2  3        E 0.6211421
2: a2 b2  4        E 0.7421095
3: a1 b2  3        E 0.3943915
4: a1 b1  1        E 0.4069439
5: a2 b2  4        E 0.7796073
6: a1 b2  3        E 0.5505793
7: a1 b1  1        E 0.3526222
8: a2 b2  4        E 0.6906605
9: a1 b1  1        E 0.2337894

#expected output:

    A  B  ID variable    value
4: a1 b1  1        E1 0.4069439
1: a1 b2  3        E1 0.6211421
2: a2 b2  4        E1 0.7421095
7: a1 b1  1        E2 0.3526222
3: a1 b2  3        E2 0.3943915
5: a2 b2  4        E2 0.7796073
9: a1 b1  1        E3 0.2337894
6: a1 b2  3        E3 0.5505793
8: a2 b2  4        E3 0.6906605

Thanks in advance for any help.


回答1:


First convert your variable to a character vector for proper coercion, and then use data.table

sampleDT$variable = as.character(sampleDT$variable)

sampleDT[, variable := paste(variable,1:.N,sep = ""), by = c("A", "B", "ID")]

This creates unique tallies based on the observed combinations of A, B, and ID.

This gets the following output:

    A  B ID variable     value
1: a1 b2  3       E1 0.6211421
2: a2 b2  4       E1 0.7421095
3: a1 b2  3       E2 0.3943915
4: a1 b1  1       E1 0.4069439
5: a2 b2  4       E2 0.7796073
6: a1 b2  3       E3 0.5505793
7: a1 b1  1       E2 0.3526222
8: a2 b2  4       E3 0.6906605
9: a1 b1  1       E3 0.2337894

which you can reorder if necessary.



来源:https://stackoverflow.com/questions/54488761/how-to-create-categories-conditionally-using-other-variables-values-and-sequence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!