Category Overlap Analysis

不打扰是莪最后的温柔 提交于 2021-02-08 03:11:11

问题


I am trying to perform some category overlap analysis and need help.

I have data made up of customer service tickets. The tickets are labeled with category data. Tickets can contain multiple category labels.

I have a query that pulls ticket ids and categories. I get multiple rows for IDs with more than one category. I am looking for a way to show the category overlap, for example: how many tickets have category A, have A and B, B and C, etc..

I would like to be able to perform this in Excel or R so that it can easily be incorporated into reports for my management.

An example of my query output is as follows:

category  ticket_id

A   3975472 
D   3975472 
B   3975472 
P   3969484 
B   3969484 
S   3969484 
P   3968360 
C   3968360 
D   3964048 
A   3964048 
C   3963748 
E   3963748

Thank you!

I was hoping to achieve an output such as:


回答1:


In Excel you could do this with a Pivot table:

In R, assuming the data is in a data frame named df, you could do something like this:

table(df$ticket_id, df$category)
#         A B C D E P S
# 3963748 0 0 1 0 1 0 0
# 3964048 1 0 0 1 0 0 0
# 3968360 0 0 1 0 0 1 0
# 3969484 0 1 0 0 0 1 1
# 3975472 1 1 0 1 0 0 0



回答2:


This was an interesting question. Hope the code below provides the solution. I am using the library reshape2 for some data rearranging.

set.seed(1)
# creating a sample dataset
dat <- data.frame(category = sample(x = letters[1:6], size = 1000,replace = T), ticket = sample(x = 1000:1500, size = 1000,replace = T))
dat <- unique(dat)
dat <- dat[order(dat$ticket, dat$category),]
head(dat)

    category ticket
311        a   1000
916        c   1000
978        d   1000
949        f   1000
72         f   1001
597        c   1002

library(reshape2)

#same as table function but gives a data frame
tab <- dcast(dat,ticket ~ category, length)  

#create all possible 2-way combinations
levels <- sort(unique(dat$category))
combs <- data.frame(rows = rep(levels,times = length(levels)), cols = rep(levels,each = length(levels)))

#calculate count for each combination
combs$count <- apply(combs,1,function(x) sum(tab[,x[1]] & tab[,x[2]]))

overlap <- dcast(combs, rows ~ cols) #convert output into a square matrix

  rows   a   b   c   d   e   f
1    a 140  38  36  41  36  42
2    b  38 128  48  32  41  39
3    c  36  48 161  35  49  36
4    d  41  32  35 123  32  35
5    e  36  41  49  32 139  38
6    f  42  39  36  35  38 138

Let me know if any of this needs further explaining.



来源:https://stackoverflow.com/questions/32188560/category-overlap-analysis

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!