Extract the difference between two strings in Java

ぃ、小莉子 提交于 2019-11-26 22:41:18

google-diff-match-patch

The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.

Diff:

Compare two blocks of plain text and efficiently return a list of differences.

Match:

Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.

Patch:

Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.

Currently available in Java, JavaScript, Dart, C++, C#, Objective C, Lua and Python. Regardless of language, each library features the same API and the same functionality. All versions also have comprehensive test harnesses.

There is a Line or word diffs wiki page which describes how to do line-by-line diffs.

Fly

One can use the StringUtils from Apache Commons. Here is the StringUtils API.

public static String difference(String str1, String str2) {
    if (str1 == null) {
        return str2;
    }
    if (str2 == null) {
        return str1;
    }
    int at = indexOfDifference(str1, str2);
    if (at == -1) {
        return EMPTY;
    }
 return str2.substring(at);
}
public static int indexOfDifference(String str1, String str2) {
    if (str1 == str2) {
        return -1;
    }
    if (str1 == null || str2 == null) {
        return 0;
    }
    int i;
    for (i = 0; i < str1.length() && i < str2.length(); ++i) {
        if (str1.charAt(i) != str2.charAt(i)) {
            break;
        }
    }
    if (i < str2.length() || i < str1.length()) {
        return i;
    }
    return -1;
}

I have used the StringTokenizer to find the solution. Below is the code snippet

public static List<String> findNotMatching(String sourceStr, String anotherStr){
    StringTokenizer at = new StringTokenizer(sourceStr, " ");
    StringTokenizer bt = null;
    int i = 0, token_count = 0;
    String token = null;
    boolean flag = false;
    List<String> missingWords = new ArrayList<String>();
    while (at.hasMoreTokens()) {
        token = at.nextToken();
        bt = new StringTokenizer(anotherStr, " ");
        token_count = bt.countTokens();
        while (i < token_count) {
            String s = bt.nextToken();
            if (token.equals(s)) {
                flag = true;
                break;
            } else {
                flag = false;
            }
            i++;
        }
        i = 0;
        if (flag == false)
            missingWords.add(token);
    }
    return missingWords;
}
Aditya Rai

convert the string to lists and then use the following method to get result How to remove common values from two array list

If you prefer not to use an external library, you can use the following Java snippet to efficiently compute the difference:

/**
 * Returns an array of size 2. The entries contain a minimal set of characters
 * that have to be removed from the corresponding input strings in order to
 * make the strings equal.
 */
public String[] difference(String a, String b) {
    return diffHelper(a, b, new HashMap<>());
}

private String[] diffHelper(String a, String b, Map<Long, String[]> lookup) {
    return lookup.computeIfAbsent(((long) a.length()) << 32 | b.length(), k -> {
        if (a.isEmpty() || b.isEmpty()) {
            return new String[]{a, b};
        } else if (a.charAt(0) == b.charAt(0)) {
            return diffHelper(a.substring(1), b.substring(1), lookup);
        } else {
            String[] aa = diffHelper(a.substring(1), b, lookup);
            String[] bb = diffHelper(a, b.substring(1), lookup);
            if (aa[0].length() + aa[1].length() < bb[0].length() + bb[1].length()) {
                return new String[]{a.charAt(0) + aa[0], aa[1]};
            } else {
                return new String[]{bb[0], b.charAt(0) + bb[1]};
            }
        }
    });
}

This approach is using dynamic programming. It tries all combinations in a brute force way but remembers already computed substrings and therefore runs in O(n^2).

Examples:

String hear = "Hi My name is Deepak"
        + "\n"
        + "How are you ?"
        + "\n"
        + "\n"
        + "How is everyone";
String dear = "Hi My name is Deepak"
        + "\n"
        + "How are you ?"
        + "\n"
        + "Hey there \n"
        + "How is everyone";
difference(hear, dear); // returns {"","Hey there "}

difference("Honda", "Hyundai"); // returns {"o","yui"}

difference("Toyota", "Coyote"); // returns {"Ta","Ce"}

I was looking for some solution but couldn't find the one i needed, so I created a utility class for comparing two version of text - new and old - and getting result text with changes between tags - [added] and [deleted]. It could be easily replaced with highlighter you choose instead of this tags, for example: a html tag. string-version-comparison

Any comments will be appreciated.

*it might not worked well with long text because of higher probability of finding same phrases as deleted.

You should use StringUtils from Apache Commons

what about this snippet ?

public static void strDiff(String hear, String dear){
    String[] hr = dear.split("\n");
    for (String h : hr) {
        if (!hear.contains(h)) {
            System.err.println(h);
        }
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!