Main sequences from Arules Sequence Mining in R

问题

How to remove the sub-sequences from cspade algorithm in arulesSequence package in R, For example if my data(Sample.txt) is as below

Column Names: sequenceID, EventID, size, Item

1   1   1   A
1   2   1   B
1   3   1   C
1   4   1   D
2   1   1   A
2   2   1   B
2   3   1   C
3   1   1   A
3   2   1   B
3   3   1   C
3   4   1   D

After running the below arulesSequence line of codes

library("arulesSequences")
#### while importing the Sample.txt remove the column names #####
SymptomArulesSeq <- read_baskets("Sample.txt",sep = "[ \t]+",info =  c("sequenceID","eventID","size"))
s1 <- cspade(SymptomArulesSeq, parameter = list(support = 0.1), control = list(verbose = TRUE),tmpdir = tempdir())
summary(s1)
as(s1, "data.frame")

sequence    support
<{A}>   1
<{B}>   1
<{C}>   1
<{D}>   0.6666667
<{A},{D}>   0.6666667
<{B},{D}>   0.6666667
<{C},{D}>   0.6666667
<{B},{C},{D}>   0.6666667
<{A},{C},{D}>   0.6666667
<{A},{B},{C},{D}>   0.6666667
<{A},{B},{D}>   0.6666667
<{A},{C}>   1
<{B},{C}>   1
<{A},{B},{C}>   1
<{A},{B}>   1

How to find the full length sequences without loosing the items between?

As from the data, the main full length sequence starting from A is A (1), A->B (1), A->B->C (1) and A->B->C->D (0.67), so How can I remove the intermediate sub-sequences and want the results as mentioned.

Challenge here is how to eliminate the sequences which are formed in between like B, B->C etc and also how to eliminate the sequences like A->B->D (Here I'm loosing the actual sequence; item C is discarded)

来源：https://stackoverflow.com/questions/24415268/main-sequences-from-arules-sequence-mining-in-r

标签

Sequence

arules

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!