Association rule in R - removing redundant rule (arules)

白昼怎懂夜的黑 提交于 2019-11-30 10:15:29

For your code to work you need an interest measure (confidence or lift) and rules.sorted needs to be sorted by either confidence or lift. Anyway, the code is horribly inefficient since is.subset() creates a matrix of size n^2, where n is the number of rules. Also, is.subset for rules merges rhs and lhs of the rule which is not correct. So don't worry too much about the implementation details.

A more efficient way to do this is now implemented as function is.redundant() in package arules (available in version 1.4-2). This explanation comes from the manual page:

A rule is redundant if a more general rules with the same or a higher confidence exists. That is, a more specific rule is redundant if it is only equally or even less predictive than a more general rule. A rule is more general if it has the same RHS but one or more items removed from the LHS. Formally, a rule X -> Y is redundant if

for some X' subset X, conf(X' -> Y) >= conf(X -> Y).

This is equivalent to a negative or zero improvement as defined by Bayardo et al. (2000). In this implementation other measures than confidence, e.g. improvement of lift, can be used as well.

Check out the examples in ? is.redundant.

Remove redundant rules with arules package...

Run apriori algorithm:

rules <- apriori(transDat, parameter = list(supp = 0.01, conf = 0.5, target = "rules", maxlen = 3))

Remove redundant:

rules <- rules[!is.redundant(rules)]

Inspect:

arules::inspect(rules)

Create a dataframe:

df = data.frame(
lhs = labels(lhs(rules)),
rhs = labels(rhs(rules)), 
rules@quality)

Just check out help for is.redundant() in rstudio, It clearly states that

Suppose there is a

rule1 X->Y with confidence cf1

rule2 X' -> Y with confidence cf2 where X' is a subset of X

rule1 is said to be redundant if rule2 has a higher confidence than rule1 i.e cf2 > cf1 (where X' is a subset of X)

i.e if there is a rule where subset of lhs can give rhs with more confidence then prior rule is said to be redundant rule.

  1. We make lower triangle as na so that the rule doesn't become subset of itself

  2. Insufficient information, rules cant be said redundant just on basis of subsetting, confidence value has to be taken in account

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!