how can I match all the key value pair in python which running too long

前端 未结 2 1728
挽巷
挽巷 2020-12-22 03:23

User-item affinity and recommendations :
I am creating a table which suggests \"customers who bought this item also bought algorithm \"
Input dataset

<
2条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-22 03:39

    Yes, algorithm could be improved. You are recalculating user list for items in inside loop multiple times. You can just get a dictionary of item and their users outside loops.

    # get unique items
    items = set(main.productId)
    
    n_users = len(set(main.userId))
    
    # make a dictionary of item and users who bought that item
    item_users = main.groupby('productId')['userId'].apply(set).to_dict()
    
    # iterate over combinations of item1 and item2 and store scores
    result = []
    for item1, item2 in itertools.combinations(items, 2):
    
      score = len(item_users[item1] & item_users[item2]) / n_users
      item_tuples = [(item1, item2), (item2, item1)]
      result.append((item1, item2, score))
      result.append((item2, item1, score)) # store score for reverse order as well
    
    # convert results to a dataframe
    result = pd.DataFrame(result, columns=["item1", "item2", "score"])
    

    Timing differences:

    Original implementation from question

    # 3 loops, best of 3: 41.8 ms per loop

    Mark's Method 2

    # 3 loops, best of 3: 19.9 ms per loop

    Implementation in this answer

    # 3 loops, best of 3: 3.01 ms per loop

提交回复
热议问题