Find all duplicate documents in a MongoDB collection by a key field

后端 未结 5 477
温柔的废话
温柔的废话 2020-11-28 22:00

Suppose I have a collection with some set of documents. something like this.

{ \"_id\" : ObjectId(\"4f127fa55e7242718200002d\"), \"id\":1, \"name\" : \"foo\"         


        
5条回答
  •  [愿得一人]
    2020-11-28 22:41

    For a generic Mongo solution, see the MongoDB cookbook recipe for finding duplicates using group. Note that aggregation is faster and more powerful in that it can return the _ids of the duplicate records.

    For pymongo, the accepted answer (using mapReduce) is not that efficient. Instead, we can use the group method:

    $connection = 'mongodb://localhost:27017';
    $con        = new Mongo($connection); // mongo db connection
    
    $db         = $con->test; // database 
    $collection = $db->prb; // table
    
    $keys       = array("name" => 1); Select name field, group by it
    
    // set intial values
    $initial    = array("count" => 0);
    
    // JavaScript function to perform
    $reduce     = "function (obj, prev) { prev.count++; }";
    
    $g          = $collection->group($keys, $initial, $reduce);
    
    echo "
    ";
    print_r($g);
    

    Output will be this :

    Array
    (
        [retval] => Array
            (
                [0] => Array
                    (
                        [name] => 
                        [count] => 1
                    )
    
                [1] => Array
                    (
                        [name] => MongoDB
                        [count] => 2
                    )
    
            )
    
        [count] => 3
        [keys] => 2
        [ok] => 1
    )
    

    The equivalent SQL query would be: SELECT name, COUNT(name) FROM prb GROUP BY name. Note that we still need to filter out elements with a count of 0 from the array. Again, refer to the MongoDB cookbook recipe for finding duplicates using group for the canonical solution using group.

提交回复
热议问题