问题
I am trying to perform some category overlap analysis and need help.
I have data made up of customer service tickets. The tickets are labeled with category data. Tickets can contain multiple category labels.
I have a query that pulls ticket ids and categories. I get multiple rows for IDs with more than one category. I am looking for a way to show the category overlap, for example: how many tickets have category A, have A and B, B and C, etc..
I would like to be able to perform this in Excel or R so that it can easily be incorporated into reports for my management.
An example of my query output is as follows:
category ticket_id
A 3975472
D 3975472
B 3975472
P 3969484
B 3969484
S 3969484
P 3968360
C 3968360
D 3964048
A 3964048
C 3963748
E 3963748
Thank you!
I was hoping to achieve an output such as:
回答1:
In Excel you could do this with a Pivot table:
In R, assuming the data is in a data frame named df, you could do something like this:
table(df$ticket_id, df$category)
# A B C D E P S
# 3963748 0 0 1 0 1 0 0
# 3964048 1 0 0 1 0 0 0
# 3968360 0 0 1 0 0 1 0
# 3969484 0 1 0 0 0 1 1
# 3975472 1 1 0 1 0 0 0
回答2:
This was an interesting question. Hope the code below provides the solution. I am using the library reshape2 for some data rearranging.
set.seed(1)
# creating a sample dataset
dat <- data.frame(category = sample(x = letters[1:6], size = 1000,replace = T), ticket = sample(x = 1000:1500, size = 1000,replace = T))
dat <- unique(dat)
dat <- dat[order(dat$ticket, dat$category),]
head(dat)
category ticket
311 a 1000
916 c 1000
978 d 1000
949 f 1000
72 f 1001
597 c 1002
library(reshape2)
#same as table function but gives a data frame
tab <- dcast(dat,ticket ~ category, length)
#create all possible 2-way combinations
levels <- sort(unique(dat$category))
combs <- data.frame(rows = rep(levels,times = length(levels)), cols = rep(levels,each = length(levels)))
#calculate count for each combination
combs$count <- apply(combs,1,function(x) sum(tab[,x[1]] & tab[,x[2]]))
overlap <- dcast(combs, rows ~ cols) #convert output into a square matrix
rows a b c d e f
1 a 140 38 36 41 36 42
2 b 38 128 48 32 41 39
3 c 36 48 161 35 49 36
4 d 41 32 35 123 32 35
5 e 36 41 49 32 139 38
6 f 42 39 36 35 38 138
Let me know if any of this needs further explaining.
来源:https://stackoverflow.com/questions/32188560/category-overlap-analysis