Approximate matching of two lists of events (with duration)

丶灬走出姿态 提交于 2019-12-21 18:39:07

问题


I have a black box algorithm that analyses a time series and "detects" certain events in the series. It returns a list of events, each containing a start time and end time. The events do not overlap. I also have a list of the "true" events, again with start time and end time for each event, not overlapping.

I want to compare the two lists and match detected and true events that fall within a certain time tolerance (True Positives). The complication is that the algorithm may detect events that are not really there (False Positives) or might miss events that were there (False Negatives).

What is an algorithm that optimally pairs events from the two lists and leaves the proper events unpaired? I am pretty sure I am not the first one to tackle this problem and that such a method exists, but I haven't been able to find it, perhaps because I do not know the right terminology.

Speed requirement: The lists will contain no more than a few hundred entries, and speed is not a major factor. Accuracy is more important. Anything taking less than a few seconds on an ordinary computer will be fine.


回答1:


Here's a quadratic-time algorithm that gives a maximum likelihood estimate with respect to the following model. Let A1 < ... < Am be the true intervals and let B1 < ... < Bn be the reported intervals. The quantity sub(i, j) is the log-likelihood that Ai becomes Bj. The quantity del(i) is the log-likelihood that Ai is deleted. The quantity ins(j) is the log-likelihood that Bj is inserted. Make independence assumptions everywhere! I'm going to choose sub, del, and ins so that, for every i < i' and every j < j', we have

sub(i, j') + sub(i', j) <= max {sub(i, j )       + sub(i', j')
                               ,del(i) + ins(j') + sub(i', j )
                               ,sub(i, j')       + del(i') + ins(j)
                               }.

This ensures that the optimal matching between intervals is noncrossing and thus that we can use the following Levenshtein-like dynamic program.

The dynamic program is presented as a memoized recursive function, score(i, j), that computes the optimal score of matching A1, ..., Ai with B1, ..., Bj. The root of the call tree is score(m, n). It can be modified to return the sequence of sub(i, j) operations in the optimal solution.

score(i, j) | i == 0 && j == 0 =      0
            | i >  0 && j == 0 =      del(i)    + score(i - 1, 0    )
            | i == 0 && j >  0 =      ins(j)    + score(0    , j - 1)
            | i >  0 && j >  0 = max {sub(i, j) + score(i - 1, j - 1)
                                     ,del(i)    + score(i - 1, j    )
                                     ,ins(j)    + score(i    , j - 1)
                                     }

Here are some possible definitions for sub, del, and ins. I'm not sure if they will be any good; you may want to multiply their values by constants or use powers other than 2. If Ai = [s, t] and Bj = [u, v], then define

sub(i, j) = -(|u - s|^2 + |v - t|^2)
del(i) = -(t - s)^2
ins(j) = -(v - u)^2.

(Apologies to the undoubtedly extant academic who published something like this in the bioinformatics literature many decades ago.)



来源:https://stackoverflow.com/questions/22174839/approximate-matching-of-two-lists-of-events-with-duration

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!