I have the following tables:
Order
----
ID (pk)
OrderItem
----
OrderID (fk -> Order.ID)
ItemID (fk -> Item.ID)
Quantity
Item
----
ID (pk)
I would try something like this for speed, listing orders by similarity to Order @OrderId. The joined INTS is supposed to be the intersection and the similarity value is my attempt to calculate the Jaccard index.
I am not using the quantity field at all here, but i think it could be done too without slowing the query down too much if we figure out a way to quantify similarity that includes quantity. Below, I am counting any identical item in two orders as a similarity. You could join on quantity as well, or use a measure where a match that includes quantity counts double. I don't know if that is reasonable.
SELECT
OI.OrderId,
1.0*COUNT(INTS.ItemId) /
(COUNT(*)
+ (SELECT COUNT(*) FROM OrderItem WHERE OrderID = @OrderId)
- COUNT(INTS.ItemId)) AS Similarity
FROM
OrderItem OI
JOIN
OrderItem INTS ON INTS.ItemID = OI.ItemID AND INTS.OrderId=@OrderId
GROUP BY
OI.OrderId
HAVING
1.0*COUNT(INTS.ItemId) /
(COUNT(*)
+ (SELECT COUNT(*) FROM OrderItem WHERE OrderID = @OrderId)
- COUNT(INTS.ItemId)) > 0.85
ORDER BY
Similarity DESC
It also presupposes that OrderId/ItemId combinations are unique in OrderItem. I realize this might not be the case, and it could be worked around using a view.
I'm sure there are better ways, but one way to weigh in quantify difference be to replace the nominator COUNT(INTS.ItemId) with something like this (supposing all quantities to be positive) that decreases the hit slowly towards 0 when the quantities differ.
1/(ABS(LOG(OI.quantity)-LOG(INTS.quantity))+1)
Added: This more readable solution using the Tanimoto Similarity suggested by JRideout
DECLARE
@ItemCount INT,
@OrderId int
SELECT
@OrderId = 1
SELECT
@ItemCount = COUNT(*)
FROM
OrderItem
WHERE
OrderID = @OrderId
SELECT
OI.OrderId,
SUM(1.0* OI.Quantity*INTS.Quantity/(OI.Quantity*OI.Quantity+INTS.Quantity*INTS.Quantity-OI.Quantity*INTS.Quantity )) /
(COUNT(*) + @ItemCount - COUNT(INTS.ItemId)) AS Similarity
FROM
OrderItem OI
LEFT JOIN
OrderItem INTS ON INTS.ItemID = OI.ItemID AND INTS.OrderId=@OrderId
GROUP BY
OI.OrderId
HAVING
SUM(1.0* OI.Quantity*INTS.Quantity/(OI.Quantity*OI.Quantity+INTS.Quantity*INTS.Quantity-OI.Quantity*INTS.Quantity )) /
(COUNT(*) + @ItemCount - COUNT(INTS.ItemId)) > 0.85
ORDER BY
Similarity DESC