aggregation

Convert strings to floats at aggregation time?

一笑奈何 提交于 2019-12-18 16:58:24
问题 Is there any way to convert strings to floats when specifying a histogram aggregation? Because I have documents with fields that are floats but are not parsed by elasticsearch as such, and when I attempt to do a sum using a string field It throws the next error. ClassCastException[org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]}]" I know I could change the mapping, but for the usage case that I have,

Convert strings to floats at aggregation time?

安稳与你 提交于 2019-12-18 16:58:17
问题 Is there any way to convert strings to floats when specifying a histogram aggregation? Because I have documents with fields that are floats but are not parsed by elasticsearch as such, and when I attempt to do a sum using a string field It throws the next error. ClassCastException[org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]}]" I know I could change the mapping, but for the usage case that I have,

How to average/sum data in a day in SQL Server 2005

假装没事ソ 提交于 2019-12-17 21:25:42
问题 I'm trying to average data in SQL Server 2005 in a day. Here is what my database look like this if I use simple query as SELECT timestamp, FEED FROM ROASTER_FEED ORDER timestamp Data: timestamp Feed 02/07/2011 12:00:01 1246 02/07/2011 12:00:01 1234 02/07/2011 12:00:01 1387 02/07/2011 12:00:02 1425 02/07/2011 12:00:03 1263 ... 02/07/2011 11:00:01 1153 02/07/2011 11:00:01 1348 02/07/2011 11:00:01 1387 02/07/2011 11:00:02 1425 02/07/2011 11:00:03 1223 .... 03/07/2011 12:00:01 1226 03/07/2011 12

Cumulative product in Spark?

守給你的承諾、 提交于 2019-12-17 20:59:46
问题 I try to implement a Cumululative product in Spark scala but I really don't know how to it. I have the following dataframe: Input data: +--+--+--------+----+ |A |B | date | val| +--+--+--------+----+ |rr|gg|20171103| 2 | |hh|jj|20171103| 3 | |rr|gg|20171104| 4 | |hh|jj|20171104| 5 | |rr|gg|20171105| 6 | |hh|jj|20171105| 7 | +-------+------+----+ And I would like to have the following output Output data: +--+--+--------+-----+ |A |B | date | val | +--+--+--------+-----+ |rr|gg|20171105| 48 | /

how to return the count of unique documents by using elasticsearch aggregation

可紊 提交于 2019-12-17 19:34:36
问题 I encountered a problem that elasticsearch could not return the count of unique documents by just using terms aggregation on a nested field. Here is an example of our model: { ..., "location" : [ {"city" : "new york", "state" : "ny"}, {"city" : "woodbury", "state" : "ny"}, ... ], ... } I want to do aggregation on the state field, but this document will be counted twice in the 'ny' bucket since 'ny' appears twice in the document. So I'm wondering if where is a way to grab the count of distinct

Aggregate bitwise-OR in a subquery

本小妞迷上赌 提交于 2019-12-17 16:33:47
问题 Given the following table: CREATE TABLE BitValues ( n int ) Is it possible to compute the bitwise-OR of n for all rows within a subquery ? For example, if BitValues contains these 4 rows: +---+ | n | +---+ | 1 | | 2 | | 4 | | 3 | +---+ I would expect the subquery to return 7. Is there a way to do this inline, without creating a UDF ? 回答1: WITH Bits AS ( SELECT 1 AS BitMask UNION ALL SELECT 2 UNION ALL SELECT 4 UNION ALL SELECT 8 UNION ALL SELECT 16 ) SELECT SUM(DISTINCT BitMask) FROM ( SELECT

Find Most Common Value and Corresponding Count Using Spark Groupby Aggregates

梦想的初衷 提交于 2019-12-13 15:44:53
问题 I am trying to use Spark (Scala) dataframes to do groupby aggregates for mode and the corresponding count. For example, Suppose we have the following dataframe: Category Color Number Letter 1 Red 4 A 1 Yellow Null B 3 Green 8 C 2 Blue Null A 1 Green 9 A 3 Green 8 B 3 Yellow Null C 2 Blue 9 B 3 Blue 8 B 1 Blue Null Null 1 Red 7 C 2 Green Null C 1 Yellow 7 Null 3 Red Null B Now we want to group by Category, then Color, and then find the size of the grouping, count of number non-nulls, the total

Elasticsearch Aggregation Broken after upgrade to 1.7.3

白昼怎懂夜的黑 提交于 2019-12-13 08:45:12
问题 It was working before the upgrade to 1.7.3 but now it is telling me my "Data too large for [Gender]. I ran the curl -XGET localhost:9200/_nodes/stats/indices/fielddata?fields=* and it produced { {"fielddata":{"memory_size_in_bytes":642066528,"evictions":0, "fields":{"Markers":{"memory_size_in_bytes":196538816}, "RegistrationDate":{"memory_size_in_bytes":101759288}, "Abbreviation":{"memory_size_in_bytes":185815224}, "Gender":{"memory_size_in_bytes":52988320}, "Birthdate":{"memory_size_in_bytes

data.table: cumulative values by grouping variable [duplicate]

久未见 提交于 2019-12-13 08:27:47
问题 This question already has answers here : Calculate cumulative sum within each ID (group) (4 answers) Closed last year . I have data set.seed(42) dat <- data.table(id=1:8, group=c(1,1,2,2,2,3,3,3), val=rnorm(8)) > dat id group val 1: 1 1 1.37095845 2: 2 1 -0.56469817 3: 3 2 0.36312841 4: 4 2 0.63286260 5: 5 2 0.40426832 6: 6 3 -0.10612452 7: 7 3 1.51152200 8: 8 3 -0.09465904 and I would like to obtain the cumulative values of val within each level of group . > res id group cum 1: 1 1 1

How to calculate difference between metrics in different aggregations in elasticsearch

旧街凉风 提交于 2019-12-13 05:14:22
问题 I want to calculate the difference of nested aggregations between two dates. To be more concrete is it possible to calculate the difference between date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value given the below request/response. Is that possible with elasticsearch v.1.0.1? The aggregation query request looks like this: { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "bool": { "must": [ { "terms":