R Basket Analysis using arules package with unique order number but duplicate order combinations

可紊 提交于 2019-12-03 00:35:24
Sanjay Roy

You must remove duplicates, if you are using .CSV file, please run Data -> Remove Duplicate in Excel before processing this file. arules throws error if duplicate are found and it is because of that you are getting the error.

Another way is to use duplicated() on your itemset and remove the duplicate using unique().

Or a more simple approach would be found in this SO post

Association analysis with duplicate transactions using arules package in R

SophiaAP

Ok, after hours of searching and reading all the pdfs I could find, I finally found the answer (and most helpful walkthrough of apriori/basket analysis ever!) in the DATA MINING Desktop Survival Guide by Graham Williams:

The read.transactions function can also read data from a file with transaction ID and a single item per line (using the format="single" option).

So there was no need to do all those transformations after import. I should have just been importing straight from the original csv file specifying the "single" format option instead of "basket." I also had to make sure the file contained no column names and that there was a unique representation of item type paired with order number (for instance, if a person ordered two items from the "Grocery" category, this needs to be represented on one row). And the cols=c(2,1) option indicates that column 1 contains the order number and column 2 is the rest of the data (ItemType).

tr <- read.transactions(file='dataset.csv', format='single', sep=',', cols=c(2,1))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!