cumulative traffic by time of day with elasticsearch

耗尽温柔 提交于 2019-12-22 17:48:31

问题


i'm receiving requests/events from a large number of client applications. i'd like to use elasticsearch to find out when my highest traffic point is.

one thing i've tried is a filter aggregation with a nested histogram and then a nested "terms" aggregation that gets the distinct hour of the day via a script field. the following is my attempt, and it performs terribly (as I'd expect since I'm executing a script per document).

{
  "aggs": {
    "sites_within_range": {
      "filter" : { 
        "range" : { 
          "occurred" : { 
            "gt" : "now-1M"
          }
        } 
      },

      "aggs": {
        "sites_over_time": {
          "date_histogram": {
            "field": "occurred",
            "interval": "week"
          },
          "aggs":{
            "site_names": {
              "terms": {
                "script": "doc['occurred'].date.getHourOfDay()",
                "size": 10000
              }
            }
          }
        }
      }

    }
  }
}

I've also considered storing the date elements i want to query as distinct parts of the document, eg:

{
    "date": "actual datetime",
    "day": "monday",
    "hour": 8
    "minute": 37
}

this also smells like the wrong answer to me.


<edit> after some investigation, looks like I might be interested in the new cardinality / percents aggregations coming in 1.1?


回答1:


The same kind of problem has been solved in this thread.

Adapting the solution to your problem, we need to make a script to convert the date into the hour of day:

Date date = new Date(doc['created_at'].value) ; 
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('HH');
format.format(date)

And use it in a query:

{
    "aggs": {
        "perWeekDay": {
            "filter" : { 
                "range" : { 
                    "occurred" : { 
                        "gt" : "now-1M"
                    }
                } 
            },
            "aggs": {
                "terms": {
                    "script": "Date date = new Date(doc['created_at'].value) ;java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('HH');format.format(date)"
            }
        }
    }
}

And you have the traffic by hour of day.

Nota bene: Storing the hours/days/minutes in your document is the most efficient way of doing that kind of aggregation. My answer assumes you don't want to store that information. Scripts usually aren't über efficent.



来源:https://stackoverflow.com/questions/22533859/cumulative-traffic-by-time-of-day-with-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!