How to select similar sets in SQL

后端 未结 8 1731
被撕碎了的回忆
被撕碎了的回忆 2020-12-13 15:38

I have the following tables:

Order
----
ID (pk)

OrderItem
----
OrderID (fk -> Order.ID)
ItemID (fk -> Item.ID)
Quantity

Item
----
ID (pk)
         


        
8条回答
  •  借酒劲吻你
    2020-12-13 16:04

    This approach takes into account the Quantity using the Extended Jaccard Coefficient or Tanimoto Similarity. It computes the similarity across all Orders, by using a the vector of common ItemIDs of magnitude Quantity. It does cost a table scan, but doesn't require an N^2 computation of all possible similarities.

    SELECT
        OrderID,
        SUM(v1.Quantity * v2.Quantity) /
        (SUM(v1.Quantity * v1.Quantity) +
         SUM(v2.Quantity * v2.Quantity) -
         SUM(v1.Quantity * v2.Quantity) ) AS coef
    FROM
        OrderItem v1 FULL OUTER JOIN OrderItem v2
        ON v1.ItemID = v2.ItemID
        AND v2.OrderID = ?
    GROUP BY OrderID
    HAVING coef > 0.85;
    

    Formula for the Extended Jaccard Coefficient:

    Tanimoto Similarity

提交回复
热议问题