Efficient way to find the difference between two data sets

微笑、不失礼 提交于 2019-12-25 02:15:00

问题


I have two copies of data, here 1 represents my volumes and 2 represent my issues. I have to compare COPY2 with COPY1 and find all the elements which are missing in COPY2 (COPY1 will always be a superset and COPY2 can be equal or will always be a subset). Now, I have to get the missing volume and the issue in COPY2. Such that from the following figure(scenario) I get the result as : -

Missing files – 1-C, 1-D, 2-C, 2-C, 3-A, 3-B, 4,E.

Question-

  1. What data structure should I use to store the above values (volume and issue) in java?
  2. How should I implement this scenario in java in the most efficient manner to find the difference between these 2 copies?

回答1:


I suggest a flat HashSet<VolumeIssue>. Each VolumeIssue instance corresponds to one categorized issue, such as 1-C.

In that case all you will need to find the difference is a call

copy1.removeAll(copy2);

What is left in copy1 are all the issues present in copy1 and missing from copy2.

Note that your VolumeIssue class must properly implement equals and hashCode for this to work.




回答2:


Since you've added the Guava tag, I'd go for a variation of Marco Topolnik's answer. Instead of removing one set from the other, use Sets.difference(left, right)

Returns an unmodifiable view of the difference of two sets. The returned set contains all elements that are contained by set1 and not contained by set2. set2 may also contain elements not present in set1; these are simply ignored. The iteration order of the returned set matches that of set1.




回答3:


What data structure should I use to store the above values (volume and issue) in java?

You can have a HashMap's with key and value pairs.

key is Volume and Value is a List of Issues.

How should I implement this scenario in java in the most efficient manner to find the difference between these 2 copies?

By getting value from both the HashMap's so you get two List's of value. Then find the difference between those two lists.

consider you got two list of values with same key from two maps.

now

  Collection<Issue> diff =  list1.removeAll( list2 );


来源:https://stackoverflow.com/questions/20825684/efficient-way-to-find-the-difference-between-two-data-sets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!