How to compare two huge List<String> in Java?

核能气质少年 提交于 2019-12-21 02:44:15

问题


My application generates 2 big lists (up to 3.5mill string records). I need the best and fastest way to compare it. Currently I am doing it like this:

List list1 = ListUtils.subtract(sourceDbResults, hiveResults);
List list2 = ListUtils.subtract(hiveResults, sourceDbResults);

But this method is really expensive on memory as i see from jconsole and sometimes process even stack on it. Any good solutions or ideas?

Element positions/order in the list are always the same, so I dont need to deal with it. After comparing I need to know if the list are the same and to get the differences from these list if they are not the same. Subtract works perfect for small lists.


回答1:


Given that you've said your two lists are already sorted, they can be compared in O(N) time, which is much faster than your current solution that uses ListUtils. The following method does this using a similar algorithm to the one that merges two sorted lists that can be found in most textbooks.

import java.util.*;

public class CompareSortedLists {
    public static void main(String[] args) {
        List<Integer> sourceDbResults = Arrays.asList(1, 2, 3, 4, 5, 8);
        List<Integer> hiveResults = Arrays.asList(2, 3, 6, 7);
        List<Integer> inSourceDb_notInHive = new ArrayList<>();
        List<Integer> inHive_notInSourceDb = new ArrayList<>();

        compareSortedLists(
                sourceDbResults, hiveResults,
                inSourceDb_notInHive, inHive_notInSourceDb);

        assert inSourceDb_notInHive.equals(Arrays.asList(1, 4, 5, 8));
        assert inHive_notInSourceDb.equals(Arrays.asList(6, 7));
    }

    /**
     * Compares two sorted lists (or other iterable collections in ascending order).
     * Adds to onlyInList1 any and all elements in list1 that are not in list2; and
     * conversely to onlyInList2. The caller must ensure the two input lists are
     * already sorted and should initialize onlyInList1 and onlyInList2 to empty,
     * writable collections.
     */
    public static <T extends Comparable<? super T>> void compareSortedLists(
            Iterable<T> list1, Iterable<T> list2,
            Collection<T> onlyInList1, Collection<T> onlyInList2) {
        Iterator<T> it1 = list1.iterator();
        Iterator<T> it2 = list2.iterator();
        T e1 = it1.hasNext() ? it1.next() : null;
        T e2 = it2.hasNext() ? it2.next() : null;
        while (e1 != null || e2 != null) {
            if (e2 == null) {  // No more elements in list2, some remaining in list1
                onlyInList1.add(e1);
                e1 = it1.hasNext() ? it1.next() : null;
            }
            else if (e1 == null) {  // No more elements in list1, some remaining in list2
                onlyInList2.add(e2);
                e2 = it2.hasNext() ? it2.next() : null;
            }
            else {
                int comp = e1.compareTo(e2);
                if (comp < 0) {
                    onlyInList1.add(e1);
                    e1 = it1.hasNext() ? it1.next() : null;
                }
                else if (comp > 0) {
                    onlyInList2.add(e2);
                    e2 = it2.hasNext() ? it2.next() : null;
                }
                else /* comp == 0 */ {
                    e1 = it1.hasNext() ? it1.next() : null;
                    e2 = it2.hasNext() ? it2.next() : null;
                }
            }
        }
    }
}

The above method uses no external libraries, and can be used with any version of Java from 6 upwards. If you use a PeekingIterator, such as the one from Apache Commons Collections, or Guava, or write your own, then you can make the method simpler, especially if you also use Java 8:

public static <T extends Comparable<? super T>> void compareSortedLists(
        Iterable<T> list1, Iterable<T> list2,
        Collection<T> onlyInList1, Collection<T> onlyInList2) {
    PeekingIterator<T> it1 = new PeekingIterator<>(list1.iterator());
    PeekingIterator<T> it2 = new PeekingIterator<>(list2.iterator());
    while (it1.hasNext() && it2.hasNext()) {
        int comp = it1.peek().compareTo(it2.peek());
        if (comp < 0)
            onlyInList1.add(it1.next());
        else if (comp > 0)
            onlyInList2.add(it2.next());
        else /* comp == 0 */ {
            it1.next();
            it2.next();
        }
    }
    it1.forEachRemaining(onlyInList1::add);
    it2.forEachRemaining(onlyInList2::add);
}


来源:https://stackoverflow.com/questions/41723028/how-to-compare-two-huge-liststring-in-java

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!