There is a list of conversations and every conversation has a list of messages. Every message has different fields and an action
field. We need to consider that
I solved it using the scripted_metric
of elastic. Also, the index
was changed from the initial state.
The script:
{
"size": 0,
"aggs": {
"intentPathsCountAgg": {
"scripted_metric": {
"init_script": "state.messagesList = new ArrayList();",
"map_script": "long currentMessageTime = doc['messageReceivedEvent.context.timestamp'].value.millis; Map currentMessage = ['conversationId': doc['messageReceivedEvent.context.conversationId.keyword'], 'time': currentMessageTime, 'intentsPath': doc['brainQueryRequestEvent.brainQueryRequest.user_data.intentsHistoryPath.keyword'].value]; state.messagesList.add(currentMessage);",
"combine_script": "return state",
"reduce_script": "List messages = new ArrayList(); Map conversationsMap = new HashMap(); Map intentsMap = new HashMap(); String[] ifElseWorkaround = new String[1]; for (state in states) { messages.addAll(state.messagesList);} messages.stream().forEach((message) -> { Map existingMessage = conversationsMap.get(message.conversationId); if(existingMessage == null || message.time > existingMessage.time) { conversationsMap.put(message.conversationId, ['time': message.time, 'intentsPath': message.intentsPath]); } else { ifElseWorkaround[0] = ''; } }); conversationsMap.entrySet().forEach(conversation -> { if (intentsMap.containsKey(conversation.getValue().intentsPath)) { long intentsCount = intentsMap.get(conversation.getValue().intentsPath) + 1; intentsMap.put(conversation.getValue().intentsPath, intentsCount); } else {intentsMap.put(conversation.getValue().intentsPath, 1L);} }); return intentsMap.entrySet().stream().map(intentPath -> [intentPath.getKey().toString(): intentPath.getValue()]).collect(Collectors.toSet()) "
}
}
}
}
Formatted script (for better readability - using .ts):
scripted_metric: {
init_script: 'state.messagesList = new ArrayList();',
map_script: `
long currentMessageTime = doc['messageReceivedEvent.context.timestamp'].value.millis;
Map currentMessage = [
'conversationId': doc['messageReceivedEvent.context.conversationId.keyword'],
'time': currentMessageTime,
'intentsPath': doc['brainQueryRequestEvent.brainQueryRequest.user_data.intentsHistoryPath.keyword'].value
];
state.messagesList.add(currentMessage);`,
combine_script: 'return state',
reduce_script: `
List messages = new ArrayList();
Map conversationsMap = new HashMap();
Map intentsMap = new HashMap();
boolean[] ifElseWorkaround = new boolean[1];
for (state in states) {
messages.addAll(state.messagesList);
}
messages.stream().forEach(message -> {
Map existingMessage = conversationsMap.get(message.conversationId);
if(existingMessage == null || message.time > existingMessage.time) {
conversationsMap.put(message.conversationId, ['time': message.time, 'intentsPath': message.intentsPath]);
} else {
ifElseWorkaround[0] = true;
}
});
conversationsMap.entrySet().forEach(conversation -> {
if (intentsMap.containsKey(conversation.getValue().intentsPath)) {
long intentsCount = intentsMap.get(conversation.getValue().intentsPath) + 1;
intentsMap.put(conversation.getValue().intentsPath, intentsCount);
} else {
intentsMap.put(conversation.getValue().intentsPath, 1L);
}
});
return intentsMap.entrySet().stream().map(intentPath -> [
'path': intentPath.getKey().toString(),
'count': intentPath.getValue()
]).collect(Collectors.toSet())`
The answer:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 11,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"intentPathsCountAgg": {
"value": [
{
"smallTalk.greet -> smallTalk.greet2 -> smallTalk.greet3": 2
},
{
"smallTalk.greet -> smallTalk.greet2 -> smallTalk.greet3 -> smallTalk.greet4": 1
},
{
"smallTalk.greet -> smallTalk.greet2": 1
}
]
}
}
}
Using script in Terms aggregation we can create buckets on first character of "context.action". Using similar terms sub aggregation we can get all the "context.action" under parent bucket ex A-> A.1->A.1.1 ...
Query:
{
"size": 0,
"aggs": {
"conversations": {
"terms": {
"script": {
"source": "def term=doc['context.action'].value; return term.substring(0,1);"
---> returns first character ex A,B,C etc
},
"size": 10
},
"aggs": {
"sub_conversations": {
"terms": {
"script": {
"source": "if(doc['context.action'].value.length()>1) return doc['context.action'];"--> All context.action under [A], length check to ignore [A]
},
"size": 10
}
},
"count": {
"cardinality": {
"script": {
"source": "if(doc['context.action'].value.length()>1) return doc['context.action'];"--> count of all context.action under A
}
}
}
}
}
}
}
Since in elastic search it not possible to join different documents. you will have to get combined key in client side by iterating over the aggregation bucket.
Result:
"aggregations" : {
"conversations" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A",
"doc_count" : 6,
"sub_conversations" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A.1",
"doc_count" : 2
},
{
"key" : "A.1.1",
"doc_count" : 2
}
]
},
"count" : {
"value" : 2
}
},
{
"key" : "B",
"doc_count" : 2,
"sub_conversations" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "B.1",
"doc_count" : 1
}
]
},
"count" : {
"value" : 1
}
}
]
}
}