How to calculate correlation between time periods

拟墨画扇 提交于 2020-01-06 10:10:42

问题


if I have 2 lists of time intervals :

List1 :
1. 2010-06-06 to 2010-12-12
2. 2010-05-04 to 2010-11-02
3. 2010-02-04 to 2010-10-08
4. 2010-04-01 to 2010-08-02
5. 2010-01-03 to 2010-02-02

and List2 :
1. 2010-06-08 to 2010-12-14
2. 2010-04-04 to 2010-10-10
3. 2010-02-02 to 2010-12-16

What would be the best way to calculate some sort of correlation or similarity factor between the two lists?

Thanks!


回答1:


You may try with Cross-Correlation.

However, you should be aware that you have vector data (start, length), and the algorithms suppose a functional dependency between them. That depends on the semantic of your data, which is not clear from the question.

HTH!

A more useful link for your current problem here.




回答2:


Is that the extent of the data or just a sample to give an idea of the structure you have?

Just a few ideas about how to look at this... My apologies if it is redundant to your current state in looking at this set.

Two basic ideas come to mind for comparing interval like this: absolute or relative. A relative comparison would ignore absolute time for the interval data and look for repeating structures or signature that occur in both groups but not necessarily at the same time. The absolute version would consider simultaneous events to be relevant and and it doesn't matter if something happens every week if they are separated by a year... You can maybe make this distinction by knowing something about the origin of the data.

If it is the grand total of data available for your decision about associations it will come down to some assumptions about what constitutes "correlation". For instance, if you have a specific model for what is going on - e.g. a time to start, time to stop (failure) model you could evaluate the likelihood of observing one sequence given the other. However, without more example data it seems unlikely you'd be able to make any firm conclusions.

The first interval in the two groups are nearly identical so they will contribute strongly to any correlation measure I can think of for the two groups. If there is a random model for this set, I would expect that many models would show these two observations and "unlikely" just because of that.

One way to asses "similarity" would be to ask what portion of the time-axis is covered (possibly generalized to multiple coverage) and compare the two groups on that basis.

Another possibility is to assign a function that adds one for each sequence that occurs during any particular day in the overall interval of these events. That way you have a continuous function with a rudimentary description of multiple events covering the same date. Calculating a correlation between the two groups might give you suggestions of structural similarity, but again you would need more groups of data to make any conclusions.

Ok that was a little rambling. Good luck with your project!



来源:https://stackoverflow.com/questions/4466255/how-to-calculate-correlation-between-time-periods

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!