How to apply a function to all combinations of rows in a data frame?

谁说我不能喝 提交于 2019-12-10 21:38:54

问题


I have trouble solving the following problem concerning the (simplified by limiting number of columns) data frame 'annotations' below.

require(irr)
# data
annotations <- read.table(text = "Obj1    Obj2    Obj3
Rater1     a       b       c
Rater2     a       b       b
Rater3     a       b       c", header = TRUE, stringsAsFactors = FALSE)

I would like to apply the function agree from the irr package to all combinations (not permutations) of rows, resulting in the following.

Agreement rater 1-2: 67%
Agreement rater 1-3: 100%
Agreement rater 2-3: 67%

I need to run a function on all combinations of rows and the function would need to access a number of/all columns.

I have worked out parts of the answer to the problem; I have generated a list of combinations running combn(rownames(annotations), 2), but I don't see how to use this list without writing inefficient for loops.

I have tried apply, as in apply(annotations, 1, agree), but I can only get this to work on one row, not the combinations mentioned before.

Does anyone have an idea how to proceed?

UPDATE: The following solution, based on your suggestions, works. (I have used kappa2 from the irr package instead of agree, but the solution to the main question remains the same.)

require(irr) #require the irr library for agreement calculations
annotations <- read.table(text = "Obj1    Obj2    Obj3
Rater1     a       b       c
Rater2     a       b       b
Rater3     a       b       c
Rater4     c       a       a", header = TRUE, stringsAsFactors = FALSE)

annotations <- t(annotations) #transpose annotations (rows become columns and vice versa)
kappa_list <- combn(colnames(annotations), 2, FUN=function(x) kappa_list[[length(kappa_list)+1]] = kappa2(matrix(c(annotations[,x[1]], annotations[,x[2]]), ncol=2))$value) #fill kappa_list with all pairs of columns (combinations of 2 raters) in annotations and, per combination, add a value to kappa_list that consists of the value of kappa2 applied to the current combination of raters
kappa_list # display the list of values

回答1:


You are close, you just need to apply on the result of combn instead. I have no idea what function you are referring to, but this should work the same if you plug in your function.

First, save the results as a list instead, because it is easier to add names (which I am adding my combining the two entries together):

toCheck <- combn(rownames(annotations), 2, simplify = FALSE)

names(toCheck) <-
  sapply(toCheck, paste, collapse = " - ")

Then, use sapply to work through your combinations. Here, I am using mean to do the comparison, but use what you need here. If you are returning more than a single value, use lapply then work with the result to print as desired

sapply(toCheck, function(x){
  mean(annotations[x[1], ] == annotations[x[2], ])
})

Which returns:

Rater 1 - Rater 2 Rater 1 - Rater 3 Rater 2 - Rater 3 
        0.6666667         1.0000000         0.6666667 



回答2:


Applying the function f(x):= 2x+5 to all entries of a column corresponding to combinations. Instead of f(x):= 2x+5, one can write his/her own function:

Step1: Design specific combinations dataframe. (The following was for my own case)

causalitycombinations <- function (nvars, ncausers, ndependents)
{
    independents <- combn(nvars, ncausers)
    swingnumber <- dim(combn(nvars - ncausers, ndependents))[[2]]
    numberofallcombinations <- dim(combn(nvars, ncausers))[[2]] * swingnumber
    dependents <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]] * swingnumber, ncol = ndependents)
    for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])) {
        dependents[(swingnumber * (i - 1) + 1):(swingnumber * i), ] <- t(combn(setdiff(seq(1:nvars), independents[, i]), ndependents))
    }
    swingedindependents <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]] * swingnumber, ncol = ncausers)
    for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])) {
        for (j in as.integer(1:swingnumber)) {
            swingedindependents[(i - 1) * swingnumber + j, ] <- independents[, i]
        }
    }
    independentsdependents <- cbind(swingedindependents, dependents)
    others <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]] * swingnumber, ncol = nvars - ncausers - ndependents)
    for (i in as.integer(1:((dim(combn(nvars, ncausers))[[2]]) * swingnumber))) {
        others[i, ] <- setdiff(seq(1:nvars), independentsdependents[i, ])
    }
    causalitiestemplate <- cbind(independentsdependents, others)
    causalitiestemplate
}

    causalitycombinations(3,1,1)
#     [,1] [,2] [,3]
#[1,]    1    2    3
#[2,]    1    3    2
#[3,]    2    1    3
#[4,]    2    3    1
#[5,]    3    1    2
#[6,]    3    2    1

Step2: Append the data to the combinations
(one can append multiple columns, I added only 1 for simplicity)

set.seed(1)
mydataframer <- cbind(causalitycombinations(3,1,1), rnorm(6))
mydataframer
 #     [,1] [,2] [,3]       [,4]
 #[1,]    1    2    3 -0.6264538
 #[2,]    1    3    2  0.1836433
 #[3,]    2    1    3 -0.8356286
 #[4,]    2    3    1  1.5952808
 #[5,]    3    1    2  0.3295078
 #[6,]    3    2    1 -0.8204684

Step3: Apply the function via lapply while taking into account number of rows of the composite dataframe

lapply(1: dim(mydataframer)[[1]], function(x) {2*mydataframer[x,4] + 5})

# 3.747092
# 5.367287
# 3.328743
# 8.190562
# 5.659016
# 3.359063

That is it.

By the way, ?irr::agree help file states that nxm ratings matrix/dataframe is "n subjects, m raters". Hence, questioner may design this better via:

annotations <- read.table(text = "Rater1    Rater2    Rater3
Subject1     a       b       c
Subject2     a       b       b
Subject3     a       b       c", header = TRUE, stringsAsFactors = FALSE)

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      b      c
# Subject2      a      b      b
# Subject3      a      b      c

Also, one thing still needs to be clarified whether questioner want to loop over all such combinations of annotations. If that's the case, i.e.,

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      a      a
# Subject2      a      a      a
# Subject3      a      a      a

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      a      b
# Subject2      a      a      a
# Subject3      a      a      a

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      a      c
# Subject2      a      a      a
# Subject3      a      a      a

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      b      a
# Subject2      a      a      a
# Subject3      a      a      a

# .... after consuming all Subject1 possibilities, this time consuming Subject2 possibilities,

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      a      a
# Subject2      a      a      b
# Subject3      a      a      a

and then Subject3 possibilities, and thereby collecting all the possibilities of agreements, then the problem changes completely.

The irr::agree function designed for multiple rows. Observe from its help file:

data(video)
video
#   rater1 rater2 rater3 rater4
# 1       4      4      3      4
# 2       4      4      4      5
# ..............................
# 20      4      5      5      4

agree(video)     # Simple percentage agreement
# Percentage agreement (Tolerance=0)
# Subjects = 20; Raters = 4; %-agree = 35 

agree(video, 1)  # Extended percentage agreement
# Percentage agreement (Tolerance=1)
# Subjects = 20; Raters = 4; %-agree = 90 

whereas in the case where the questioner wanna apply row-wise agrees (only 1 subject!), the %-agree is always 0:

agree(video[1,])
# Percentage agreement (Tolerance=0)
# Subjects = 1; Raters = 4; %-agree = 0

...

agree(video[20,])
# Percentage agreement (Tolerance=0)
# Subjects = 1; Raters = 4; %-agree = 0


来源:https://stackoverflow.com/questions/40516059/how-to-apply-a-function-to-all-combinations-of-rows-in-a-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!