fpgrowth

Appending column name to column value using Spark

柔情痞子 提交于 2021-01-28 20:05:44
问题 I have data in comma separated file, I have loaded it in the spark data frame: The data looks like: A B C 1 2 3 4 5 6 7 8 9 I want to transform the above data frame in spark using pyspark as: A B C A_1 B_2 C_3 A_4 B_5 C_6 -------------- Then convert it to list of list using pyspark as: [[ A_1 , B_2 , C_3],[A_4 , B_5 , C_6]] And then run FP Growth algorithm using pyspark on the above data set. The code that I have tried is below: from pyspark.sql.functions import col, size from pyspark.sql

Maximum Pattern Length fpGrowth (Apache) PySpark

六月ゝ 毕业季﹏ 提交于 2019-12-11 14:18:14
问题 I am trying to run Association rules using PySpark. I first create an FPGrowth tree and pass that to the Association Rules method. However, I wish to add a maximum pattern length parameter, to limit the number of items I want on the LHS and RHS. I only want to keep pattern length to 2 for associations between items. ## fit model from pyspark.ml.fpm import FPGrowth fpGrowth_1 = FPGrowth(itemsCol="collect_set(title_name)", minSupport=.001, minConfidence=0.001) model_working_1 = fpGrowth_1.fit