问题
This is related to a question I asked previously for which lag/lead was suggested. However the data I'm working with are more complex than I first thought so I need a more robust solution. This screen shot shows an issue I need to tackle:

Within a single serial number, a shipment event defines a new reference window. So records 2,3,4 relate to 1. Record 6 relates to 5 and so forth. I need to mark the records for which the BillToId doesn't match the parent shipment.
I'm trying to understand if I could even use the LAG function to compare records 2,3,4 back to 1 when the number of post-shipment events varies (duplicates are allowed). I was thinking I might be better off with another fact table that identifies the parent rowid along each record first?
So then my question becomes how do I efficiently identify which shipment each row belongs to? Am I forced to run a subquery for each record? I'm working right now with over 2 million total rows. I would later make this query part of the ETL process so it would be processing smaller chunks of data.
回答1:
Here is an approach that uses the cumulative sum functionality in SQL Server. The idea is to assign each "ship" activity a value of "1" and "0" for everything else. Then do a cumulative sum to identify each group that should have the same billtoid
. After that, the ship information can be assigned to all records in the same group:
select rowid, dateid, billtoid, activitytypeid, serialnumber
from (select t.*,
max(case when activitytypeid = 'Ship' then billtoid end) over
(partition by serialnumber, cumships) as ship_billtoid
from (select t.*,
sum(case when activitytypeid = 'Ship' then 1 else 0 end) over
(partition by serialnumber order by rowid) as cumships
from t
) t
) t
where billtoid <> ship_billtoid;
来源:https://stackoverflow.com/questions/21635251/identifying-parent-records-for-many-transactions