How do you measure similarity between 2 series of data?

谁说胖子不能爱 提交于 2019-12-04 15:17:33

问题


I need to find a similarity measurement between two arrays of data. You can call similarity measurement whatever you want, difference, correlation or whatever.

For example:

 1, 2, 3, 4, 5 < Series 1
 2, 3, 4, 5, 6 < Series 2

Should be far more similar to each other than these 2 series:

 1, 2, 3, 4, 5 < Series 1
 1, 1, 5, 8, 7 < Series 2

Any suggestions?

Is there a source code available for it?


回答1:


You can calculate the sample Pearson product-moment correlation coefficient: "The above formula suggests a convenient single-pass algorithm for calculating sample correlations". Write a loop to calculate sum(xi), sum(yi), sum(xi^2), sum(yi^2), and sum(xi*yi). Then insert these sums into the formula.




回答2:


If your definition of similarity is how much same elements there are you can use set intersection:

std::multiset<int> Series1 = std::multiset({ 1, 2, 3, 4, 5 });
std::multiset<int> Series2 = std::multiset({ 2, 3, 4, 5, 6 });
std::multiset<int> Intersection;

std::set_intersection(Series1.begin(), Series1.end(),
                      Series2.begin(), Series2.end(),
                      std::back_inserter(Intersection));

int similarity = Intersection.size(); // = 4



回答3:


Another way to do this is to calculate mutual information, there is a toolbox for this in matlab and C http://www.cs.man.ac.uk/~pococka4/MIToolbox.html



来源:https://stackoverflow.com/questions/8371091/how-do-you-measure-similarity-between-2-series-of-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!