I encountered a problem that elasticsearch could not return the count of unique documents by just using terms aggregation on a nested field.
Here is an example of ou
I think you need a reverse_nested
aggregation, because you want aggregation based on a nested value, but actually counting the ROOT documents, not the nested ones
{
"query": {
"bool": {
"must": [
{
"term": {
"last_name": "smith"
}
}
]
}
},
"aggs": {
"location": {
"nested": {
"path": "location"
},
"aggs": {
"state": {
"terms": {
"field": "location.state",
"size": 10
},
"aggs": {
"top_reverse_nested": {
"reverse_nested": {}
}
}
}
}
}
}
}
And, as a result, you would see something like this:
"aggregations": {
"location": {
"doc_count": 6,
"state": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ny",
"doc_count": 4,
"top_reverse_nested": {
"doc_count": 2
}
},
{
"key": "ca",
"doc_count": 2,
"top_reverse_nested": {
"doc_count": 2
}
}
]
}
}
}
And what you are looking for is under top_reverse_nested
part.
One point here: if I'm not mistaking "doc_count": 6
is the NESTED document count, so don't be confused about these numbers thinking you are counting root documents, the count is on the nested ones. So, for a document with three nested ones that match, the count would be 3, not 1.