I have the following tables:
Order
----
ID (pk)
OrderItem
----
OrderID (fk -> Order.ID)
ItemID (fk -> Item.ID)
Quantity
Item
----
ID (pk)
This approach takes into account the Quantity using the Extended Jaccard Coefficient or Tanimoto Similarity. It computes the similarity across all Orders, by using a the vector of common ItemIDs of magnitude Quantity. It does cost a table scan, but doesn't require an N^2 computation of all possible similarities.
SELECT
OrderID,
SUM(v1.Quantity * v2.Quantity) /
(SUM(v1.Quantity * v1.Quantity) +
SUM(v2.Quantity * v2.Quantity) -
SUM(v1.Quantity * v2.Quantity) ) AS coef
FROM
OrderItem v1 FULL OUTER JOIN OrderItem v2
ON v1.ItemID = v2.ItemID
AND v2.OrderID = ?
GROUP BY OrderID
HAVING coef > 0.85;
Formula for the Extended Jaccard Coefficient:
