Checking for duplicate Javascript objects

孤人 提交于 2019-12-03 21:35:12

I'd suggest you create an MD5 hash of the JSON.stringify(o) and store that in your hashmap with a reference to your stored object as the data for the hash. And to make sure that there are no object key order differences in the JSON.stringify(), you have to create a copy of the object that orders the keys.

Then, when each new object comes in, you check it against the hash map. If you find a match in the hash map, then you compare the incoming object with the actual object that you've stored to see if they are truly duplicates (since there can be MD5 hash collisions). That way, you have a manageable hash table (with only MD5 hashes in it).

Here's code to create a canonical string representation of an object (including nested objects or objects within arrays) that handles object keys that might be in a different order if you just called JSON.stringify().

// Code to do a canonical JSON.stringify() that puts object properties 
// in a consistent order
// Does not allow circular references (child containing reference to parent)
JSON.stringifyCanonical = function(obj) {
    // compatible with either browser or node.js
    var Set = typeof window === "object" ? window.Set : global.Set;

    // poor man's Set polyfill
    if (typeof Set !== "function") {
        Set = function(s) {
            if (s) {
                this.data = s.data.slice();
            } else {
                this.data = [];
            }
        };
        Set.prototype = {
            add: function(item) {
                this.data.push(item);
            },
            has: function(item) {
                return this.data.indexOf(item) !== -1;
            }
        };
    }

    function orderKeys(obj, parents) {
        if (typeof obj !== "object") {
            throw new Error("orderKeys() expects object type");
        }
        var set = new Set(parents);
        if (set.has(obj)) {
            throw new Error("circular object in stringifyCanonical()");
        }
        set.add(obj);
        var tempObj, item, i;
        if (Array.isArray(obj)) {
            // no need to re-order an array
            // but need to check it for embedded objects that need to be ordered
            tempObj = [];
            for (i = 0; i < obj.length; i++) {
                item = obj[i];
                if (typeof item === "object") {
                    tempObj[i] = orderKeys(item, set);
                } else {
                    tempObj[i] = item;
                }
            }
        } else {
            tempObj = {};
            // get keys, sort them and build new object
            Object.keys(obj).sort().forEach(function(item) {
                if (typeof obj[item] === "object") {
                    tempObj[item] = orderKeys(obj[item], set);
                } else {
                    tempObj[item] = obj[item];
                }
            });
        }
        return tempObj;
    }

    return JSON.stringify(orderKeys(obj));
}

And, the algorithm

var myHashMap = {};

function processObject(o) {
    var stringifiedCandidate = JSON.stringifyCanonical(o);
    var hash = CreateMD5(stringifiedCandidate);
    var list = [], found = false;
    // is it in the hashmap?
    if (!myHashMap[hash] {
        // not in the hash table, so it's a unique object
        myObjects.push(o);
        list.push(myObjects.length - 1);    // put a reference to the object with this hash value in the list
        myHashMap[hash] = list;             // store the list in the hash table for future comparisons
    } else {
        // the hash does exist in the hash table, check for an exact object match to see if it's really a duplicate
        list = myHashMap[hash];             // get the list of other object indexes with this hash value
        // loop through the list
        for (var i = 0; i < list.length; i++) {
            if (stringifiedCandidate === JSON.stringifyCanonical(myObjects[list[i]])) {
                found = true;       // found an exact object match
                break;
            }
        }
        // if not found, it's not an exact duplicate, even though there was a hash match
        if (!found) {
            myObjects.push(o);
            myHashMap[hash].push(myObjects.length - 1);
        }
    }
}

Test case for jsonStringifyCanonical() is here: https://jsfiddle.net/jfriend00/zfrtpqcL/

  1. Maybe. For example if You know what kind object goes by You could write better indexing and searching system than JS objects' keys. But You could only do that with JavaScript and object keys are written in C...
  2. Must Your hashing be lossless or not? If can than try to lose compression (MD5). I guessing You will lose some speed and gain some memory. By the way, do JSON.stringify(o) guarantees same key ordering. Because {foo: 1, bar: 2} and {bar: 2, foo: 1} is equal as objects, but not as strings.
  3. Cost memory

One possible optimization:

Instead of using getJSON use $.get and pass "text" as dataType param. Than You can use result as Your hash and convert to object afterwards.

Actually by writing last sentence I though about another solution:

  • Collect all results with $.get into array
  • Sort it with buildin (c speed) Array.sort
  • Now You can easily spot and remove duplicates with one for

Again different JSON strings can make same JavaScript object.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!