aggregation

Elasticsearch count terms ignoring spaces

牧云@^-^@ 提交于 2019-12-04 09:42:39
问题 Using ES 1.2.1 My aggregation { "size": 0, "aggs": { "cities": { "terms": { "field": "city","size": 300000 } } } } The issue is that some city names have spaces in them and aggregate separately. For instance Los Angeles { "key": "Los", "doc_count": 2230 }, { "key": "Angeles", "doc_count": 2230 }, I assume it has to do with the analyzer? Which one would I use to not split on spaces? 回答1: For fields that you want to perform aggregations on I would recommend either the keyword analyzer or do not

Elasticsearch - group by day of week and hour

落爺英雄遲暮 提交于 2019-12-04 08:46:00
I need to do get some data grouped by day of week and hour, for example curl -XGET http://localhost:9200/testing/hello/_search?pretty=true -d ' { "size": 0, "aggs": { "articles_over_time" : { "date_histogram" : { "field" : "date", "interval" : "hour", "format": "E - k" } } } } ' Gives me this: { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2857, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "articles_over_time" : { "buckets" : [ { "key_as_string" : "Fri - 17", "key" : 1391792400000, "doc_count" : 6 }, ... { "key_as

Selecting positive aggregate value and ignoring negative in Postgres SQL

微笑、不失礼 提交于 2019-12-04 07:51:16
I must apply a certain transformation fn(argument) . Here argument is equal to value , but not when it is negative. When you get a first negative value , then you "wait" until it sums up with consecutive values and this sum becomes positive. Then you do fn(argument) . See the table I want to get: value argument --------------------- 2 2 3 3 -10 0 4 0 3 0 10 7 1 1 I could have summed all values and apply fn to the sum, but fn can be different for different rows and it is essential to know the row number to choose a concrete fn. As want a Postgres SQL solution, looks like window functions fit,

How to release Maven multi-module project with inter-project dependencies?

安稳与你 提交于 2019-12-04 06:07:48
Lets say we have 3 layers project. DB, Business, Web and aggregating pom. Project |-DB | |-pom.xml |-Business | |-pom.xml |-pom.xml All modules are ment to be released and branched together, so Aggregator pom is configured to assign the same version to all submodules. We have the following versions: DB-0.1-SNAPSHOT Business-0.1-SNAPSHOT which depends on DB-0.1-SNAPSHOT Web-0.1-SNAPSHOT which depends on Business-0.1-SNAPSHOT When doing release:prepare , all versions updated to 0.1, but prepare fails because there is no DB-0.1 in dependency yet. One solution is to create different projects for

Finding duplicate values

感情迁移 提交于 2019-12-04 05:28:54
问题 I have a table called Transfers, and I would like to find all the records that have duplicate values on three columns Doc ID,Amount and Date. Basically what I need is to find where Doc id ,amount and dates are the same What is the best query I can use to find these duplicates? I tried the following query select transfers.doc_id,transfers.date,transfers.amount, from transfers where transfers.date between $P{StartDate} and $P{EndDate} group by doc_id having doc_id >1; Here is what results am

Performing aggregation through date and time in SQL

╄→гoц情女王★ 提交于 2019-12-04 03:07:05
I have a data-set which contains observations for several weeks with 2 minutes frequency. I want to increase the time interval from 2 minute to 5 minute. The problem is that, frequency of the observations are not always the same. I mean, theoretically, every 10 minute there should be 5 observation but usually it is not the case. Please let me know how I can aggregate the observations based on average function and with respect to the time and date of the observations. In other words aggregation based on every 5 minutes while number of observations are not the same for each 5 minute time

Elasticsearch filter aggregations on minimal doc count

我们两清 提交于 2019-12-04 02:55:26
I am really new to elasticsearch world. Let's say I have a nested aggregation on two fields : field1 and field2 : { ... aggs: { field1: { terms: { field: 'field1' }, aggs: { field2: { terms: { field: 'field2' } } } } } } This piece of code works perfectly and gives me something like this : aggregations: { field1: { buckets: [{ key: "foo", doc_count: 123456, field2: { buckets: [{ key: "bar", doc_count: 34323 },{ key: "baz", doc_count: 10 },{ key: "foobar", doc_count: 36785 }, ... ] },{ key: "fooOO", doc_count: 423424, field2: { buckets: [{ key: "bar", doc_count: 35 },{ key: "baz", doc_count:

Django Postgresql ArrayField aggregation

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-04 00:36:05
问题 In my Django application, using Postgresql, I have a model with an ArrayField of CharFields. I would like to know if there's a DB way to aggregate and get a list of all the strings in the table. For example: ['dog', 'cat'] ['dog'] ['cat'] would yield ['dog', 'cat'] I know how to do that in Python but would like to find out a way to aggregate this on the DB level. Using Django 1.8.4 回答1: In PostgreSQL you can do the following: SELECT DISTINCT UNNEST(array_column) FROM the_table; So if your

grouping every N values

孤者浪人 提交于 2019-12-03 17:13:53
I have a table like this in PostgreSQL. I want to perform aggregation functions like mean and max for every 16 records based on ID (which is primary key). For example I have to calculate mean value for first 16 records and second 16 records and so on. +-----+------------- | ID | rainfall | +-----+----------- | | 1 | 110.2 | | 2 | 56.6 | | 3 | 65.6 | | 4 | 75.9 | +-----+------------ The 1st approach that comes to mind is to use row_number() to annotate the table, then group by blocks of 16 rows. SELECT min(id) as first_id, max(id) AS last_id, avg(rainfall) AS avg_this_16 FROM ( SELECT id,

Maven Inheritance and Aggregation Example Architecture

给你一囗甜甜゛ 提交于 2019-12-03 11:43:19
问题 I have a question regarding how best to re-structure a number of individual Maven projects using a combination of inheritance and aggregation. Setting the scene: There are 3 code based existing Maven projects all developed by the same team. 1 project is an API, lets call is project-api. The other 2 projects are web apps which utilise the project-api. Lets call them web-app1 and web-app2. All three projects have a couple of basic dependencies like log4j and junit in common. Aside from that,