How to find duplicates in a nested array in cosmos db without GROUP BY and COUNT

问题

I am trying to find duplicates in a nested object in a collection. In ye olde SQL, I would do this with some sort of GROUP BY and a COUNT. Cosmos DB doesn't support GROUP BY (as far as I can see) so I am trying to find a work around. One limitation is that I only have access to the data explorer in the azure portal (Don't ask).

To explain in more detail, suppose you have a collection like the following. Note that the first item has a duplicate in the "stuff" collection:

[
    {
        "id": "1",
        "Name": "Item with duplicate stuff",
        "stuff" : [
            {
                "name" : "A",
            },
            {
                "name" : "B",
            },
            {
                "name" : "A"
            }  
        ]
    },
    {
        "id": "2",
        "Name": "Item with unique stuff",
        "stuff" : [
            {
                "name" : "A",
            },
            {
                "name" : "B",
            },
            {
                "name" : "C"
            }  
        ]
    }

I want to find all the items in my collection that have duplicates in the "stuff" property. So in this case it would return the item with id "1". Something like this would do nicely:

[
    {
        "id": "1"
    } 
]

Nothing I have tried has worked and is unfit to show here.

回答1:

Yes as you mentioned CosmosDB currently does not support GROUP BY nor any other aggregation.

However, You can achieve group by using documentdb-lumenize. You load cube.string as a stored procedure, then you call it with an aggregation configuration.

{cubeConfig: {groupBy: "name", field: "stuff.name", f: "max"}}

that should do what you want.

or if you want to still use sql api you can try using Join as explained in the answer here

Personally i also faced the same issue, but i had to manage with my custom logic after retrieving the records with filtered conditions.

EDIT

With the comment below, it should be Yes as you mentioned CosmosDB currently does not support GROUP BY ~~nor any other aggregation~~.

回答2:

Cosmos db supports subqueries and DISTINCT keyword. So, something like this should work

  SELECT n2
    FROM c
    JOIN (SELECT DISTINCT value s.name FROM s IN c['stuff'])  n2

result on first item

[
    {
        "n2": "A"
    },
    {
        "n2": "B"
    },
    {
        "n2": "C"
    }
]

Ref: https://docs.microsoft.com/en-gb/azure/cosmos-db/sql-query-subquery

P.S. Also, Cosmos db now supports Group By https://docs.microsoft.com/en-gb/azure/cosmos-db/sql-query-group-by

来源：https://stackoverflow.com/questions/50586721/how-to-find-duplicates-in-a-nested-array-in-cosmos-db-without-group-by-and-count

标签

azure

azure-cosmosdb

azure-cosmosdb-sqlapi