ElasticSearch mapping the result of collapse / do operations on a grouped documents

后端 未结 2 1449
傲寒
傲寒 2020-12-21 17:31

There is a list of conversations and every conversation has a list of messages. Every message has different fields and an action field. We need to consider that

相关标签:
2条回答
  • 2020-12-21 18:08

    I solved it using the scripted_metric of elastic. Also, the index was changed from the initial state.

    The script:

    {
       "size": 0,
       "aggs": {
            "intentPathsCountAgg": {
                "scripted_metric": {
                    "init_script": "state.messagesList = new ArrayList();",
                    "map_script": "long currentMessageTime = doc['messageReceivedEvent.context.timestamp'].value.millis; Map currentMessage = ['conversationId': doc['messageReceivedEvent.context.conversationId.keyword'], 'time': currentMessageTime, 'intentsPath': doc['brainQueryRequestEvent.brainQueryRequest.user_data.intentsHistoryPath.keyword'].value]; state.messagesList.add(currentMessage);",  
                    "combine_script": "return state",
                    "reduce_script": "List messages = new ArrayList(); Map conversationsMap = new HashMap(); Map intentsMap = new HashMap(); String[] ifElseWorkaround = new String[1]; for (state in states) { messages.addAll(state.messagesList);} messages.stream().forEach((message) -> { Map existingMessage = conversationsMap.get(message.conversationId); if(existingMessage == null || message.time > existingMessage.time) { conversationsMap.put(message.conversationId, ['time': message.time, 'intentsPath': message.intentsPath]); } else { ifElseWorkaround[0] = ''; } }); conversationsMap.entrySet().forEach(conversation -> { if (intentsMap.containsKey(conversation.getValue().intentsPath)) { long intentsCount = intentsMap.get(conversation.getValue().intentsPath) + 1; intentsMap.put(conversation.getValue().intentsPath, intentsCount); } else {intentsMap.put(conversation.getValue().intentsPath, 1L);} }); return intentsMap.entrySet().stream().map(intentPath -> [intentPath.getKey().toString(): intentPath.getValue()]).collect(Collectors.toSet()) "
                }
            }
        }
    }
    

    Formatted script (for better readability - using .ts):

    scripted_metric: {
      init_script: 'state.messagesList = new ArrayList();',
      map_script: `
        long currentMessageTime = doc['messageReceivedEvent.context.timestamp'].value.millis;
        Map currentMessage = [
          'conversationId': doc['messageReceivedEvent.context.conversationId.keyword'],
          'time': currentMessageTime,
          'intentsPath': doc['brainQueryRequestEvent.brainQueryRequest.user_data.intentsHistoryPath.keyword'].value
        ];
        state.messagesList.add(currentMessage);`,
      combine_script: 'return state',
      reduce_script: `
        List messages = new ArrayList();
        Map conversationsMap = new HashMap();
        Map intentsMap = new HashMap();
        boolean[] ifElseWorkaround = new boolean[1];
    
        for (state in states) {
          messages.addAll(state.messagesList);
        }
    
        messages.stream().forEach(message -> {
          Map existingMessage = conversationsMap.get(message.conversationId);
          if(existingMessage == null || message.time > existingMessage.time) {
            conversationsMap.put(message.conversationId, ['time': message.time, 'intentsPath': message.intentsPath]);
          } else {
            ifElseWorkaround[0] = true;
          }
        });
    
        conversationsMap.entrySet().forEach(conversation -> {
          if (intentsMap.containsKey(conversation.getValue().intentsPath)) {
            long intentsCount = intentsMap.get(conversation.getValue().intentsPath) + 1;
            intentsMap.put(conversation.getValue().intentsPath, intentsCount);
          } else {
            intentsMap.put(conversation.getValue().intentsPath, 1L);
          }
        });
    
        return intentsMap.entrySet().stream().map(intentPath -> [
          'path': intentPath.getKey().toString(),
          'count': intentPath.getValue()
        ]).collect(Collectors.toSet())`
    

    The answer:

    {
        "took": 2,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": {
                "value": 11,
                "relation": "eq"
            },
            "max_score": null,
            "hits": []
        },
        "aggregations": {
            "intentPathsCountAgg": {
                "value": [
                    {
                        "smallTalk.greet -> smallTalk.greet2 -> smallTalk.greet3": 2
                    },
                    {
                        "smallTalk.greet -> smallTalk.greet2 -> smallTalk.greet3  -> smallTalk.greet4": 1
                    },
                    {
                        "smallTalk.greet -> smallTalk.greet2": 1
                    }
                ]
            }
        }
    }
    
    0 讨论(0)
  • 2020-12-21 18:11

    Using script in Terms aggregation we can create buckets on first character of "context.action". Using similar terms sub aggregation we can get all the "context.action" under parent bucket ex A-> A.1->A.1.1 ...

    Query:

    {
      "size": 0,
      "aggs": {
        "conversations": {
          "terms": {
            "script": {
              "source": "def term=doc['context.action'].value; return term.substring(0,1);" 
    --->  returns first character ex A,B,C etc
            },
            "size": 10
          },
          "aggs": {
            "sub_conversations": {
              "terms": {
                "script": {
                  "source": "if(doc['context.action'].value.length()>1) return doc['context.action'];"--> All context.action under [A], length check to ignore [A]
                },
                "size": 10
              }
            },
            "count": {
              "cardinality": {
                "script": {
                  "source": "if(doc['context.action'].value.length()>1) return doc['context.action'];"--> count of all context.action under A
                }
              }
            }
          }
        }
      }
    }
    

    Since in elastic search it not possible to join different documents. you will have to get combined key in client side by iterating over the aggregation bucket.

    Result:

      "aggregations" : {
        "conversations" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "A",
              "doc_count" : 6,
              "sub_conversations" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "A.1",
                    "doc_count" : 2
                  },
                  {
                    "key" : "A.1.1",
                    "doc_count" : 2
                  }
                ]
              },
              "count" : {
                "value" : 2
              }
            },
            {
              "key" : "B",
              "doc_count" : 2,
              "sub_conversations" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "B.1",
                    "doc_count" : 1
                  }
                ]
              },
              "count" : {
                "value" : 1
              }
            }
          ]
        }
      }
    
    0 讨论(0)
提交回复
热议问题