Apriori Algorithm

前端未结

关注

 4  2091

I\'ve heard about the Apriori algorithm several times before but never got the time or the opportunity to dig into it, can anyone explain to me in a simple way the workings

相关标签:

4条回答

不思量自难忘°

2020-12-29 17:27

The best introduction to Apriori can be downloaded from this book:

http://www-users.cs.umn.edu/~kumar/dmbook/index.php

you can download the chapter 6 for free which explain Apriori very clearly.

Moreover, if you want to download a Java version of Apriori and other algorithms for frequent itemset mining, you can check my website:

http://www.philippe-fournier-viger.com/spmf/

0 讨论(0)
发布评论:

提交评论
- 加载中...
长发绾君心

2020-12-29 17:31
Apriori Algorithm

It is a candidate-generation-and-test approach for frequent pattern mining in datasets. There are two things you have to remember.

Apriori Pruning Principle - If any itemset is infrequent, then its superset should not be generated/tested.

Apriori Property - A given (k+1)-itemset is a candidate (k+1)-itemset only if everyone of its k-itemset subsets are frequent.

Now, here is the apriori algorithm in 4 steps.
1. Initially, scan the database/dataset once to get the frequent 1-itemset.
2. Generate length k+1 candidate itemsets from length k frequent itemsets.
3. Test the candidates against the database/dataset.
4. Terminate when no frequent or candidate set can be generated.
Solved Example

Suppose there is a transaction database as follows with 4 transactions including their transaction IDs and items bought with them. Assume the minimum support - min_sup is 2. The term support is the number of transactions in which a certain itemset is present/included.

Transaction DB
```
tid  | items
-------------
10   | A,C,D
20   | B,C,E
30   | A,B,C,E
40   | B,E
```
Now, let's create the candidate 1-itemsets by the 1st scan of the DB. It is simply called as the set of C_1 as follows.
```
itemset | sup
-------------
  {A}   |  2
  {B}   |  3
  {C}   |  3
  {D}   |  1
  {E}   |  3
```
If we test this with min_sup, we can see {D} does not satisfy the min_sup of 2. So, it will not be included in the frequent 1-itemset, which we simply call as the set of L_1 as follows.
```
itemset | sup
-------------
  {A}   |  2
  {B}   |  3
  {C}   |  3
  {E}   |  3
```
Now, let's scan the DB for the 2nd time, and generate candidate 2-itemsets, which we simply call as the set of C_2 as follows.
```
itemset | sup
-------------
  {A,B} |  1
  {A,C} |  2
  {A,E} |  1
  {B,C} |  2
  {B,E} |  3
  {C,E} |  2
```
As you can see, {A,B} and {A,E} itemsets do not satisfy the min_sup of 2 and hence they will not be included in the frequent 2-itemset, L_2
```
itemset | sup
-------------
  {A,C} |  2
  {B,C} |  2
  {B,E} |  3
  {C,E} |  2
```
Now let's do a 3rd scan of the DB and get candidate 3-itemsets, C_3 as follows.
```
itemset  | sup
-------------
 {A,B,C} |  1
 {A,B,E} |  1
 {A,C,E} |  1
 {B,C,E} |  2
```
You can see that, {A,B,C}, {A,B,E} and {A,C,E} does not satisfy min_sup of 2. So they will not be included in frequent 3-itemset, L_3 as follows.
```
itemset  | sup
-------------
 {B,C,E} |  2
```
Now, finally, we can calculate the support (supp), confidence (conf) and lift (interestingness value) values of the Association/Correlation Rules that can be generated by the itemset {B,C,E} as follows.
```
         Rule       | supp  |  conf   |  lift
    -------------------------------------------
     B ->  C & E    |  50%  |  66.67% |  1.33
     E ->  B & C    |  50%  |  66.67% |  1.33
     C ->  E & B    |  50%  |  66.67% |  1.77
     B & C ->  E    |  50%  |  100%   |  1.33
     E & B ->  C    |  50%  |  66.67% |  1.77
     C & E ->  B    |  50%  |  100%   |  1.33
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-12-29 17:37

Well, I would assume you've read the wikipedia entry but you said "a basic example would make it a lot easier for me to understand". Wikipedia has just that so I'll assume you haven't read it and suggest that you do.

Read the wikipedia article.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-12-29 17:37

See Top 10 algorithms in data mining (free access) or The Top Ten Algorithms in Data Mining. The latter gives a detailed description of the algorithm, together with details on how to get optimized implementations.

0 讨论(0)
发布评论:

提交评论
- 加载中...

Apriori Algorithm

Apriori Algorithm

Solved Example