发表新帖

发表新帖

How to select similar sets in SQL

后端未结

关注

 8  1731

被撕碎了的回忆 2020-12-13 15:38

I have the following tables:

Order
----
ID (pk)

OrderItem
----
OrderID (fk -> Order.ID)
ItemID (fk -> Item.ID)
Quantity

Item
----
ID (pk)

8条回答

借酒劲吻你 (楼主)

2020-12-13 16:04
This approach takes into account the Quantity using the Extended Jaccard Coefficient or Tanimoto Similarity. It computes the similarity across all Orders, by using a the vector of common ItemIDs of magnitude Quantity. It does cost a table scan, but doesn't require an N^2 computation of all possible similarities.
```
SELECT
    OrderID,
    SUM(v1.Quantity * v2.Quantity) /
    (SUM(v1.Quantity * v1.Quantity) +
     SUM(v2.Quantity * v2.Quantity) -
     SUM(v1.Quantity * v2.Quantity) ) AS coef
FROM
    OrderItem v1 FULL OUTER JOIN OrderItem v2
    ON v1.ItemID = v2.ItemID
    AND v2.OrderID = ?
GROUP BY OrderID
HAVING coef > 0.85;
```
Formula for the Extended Jaccard Coefficient:
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题