aggregation | 易学教程

Convert strings to floats at aggregation time?

阅读更多关于 Convert strings to floats at aggregation time?

问题 Is there any way to convert strings to floats when specifying a histogram aggregation? Because I have documents with fields that are floats but are not parsed by elasticsearch as such, and when I attempt to do a sum using a string field It throws the next error. ClassCastException[org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]}]" I know I could change the mapping, but for the usage case that I have,

Convert strings to floats at aggregation time?

阅读更多关于 Convert strings to floats at aggregation time?

How to average/sum data in a day in SQL Server 2005

阅读更多关于 How to average/sum data in a day in SQL Server 2005

问题 I'm trying to average data in SQL Server 2005 in a day. Here is what my database look like this if I use simple query as SELECT timestamp, FEED FROM ROASTER_FEED ORDER timestamp Data: timestamp Feed 02/07/2011 12:00:01 1246 02/07/2011 12:00:01 1234 02/07/2011 12:00:01 1387 02/07/2011 12:00:02 1425 02/07/2011 12:00:03 1263 ... 02/07/2011 11:00:01 1153 02/07/2011 11:00:01 1348 02/07/2011 11:00:01 1387 02/07/2011 11:00:02 1425 02/07/2011 11:00:03 1223 .... 03/07/2011 12:00:01 1226 03/07/2011 12

Cumulative product in Spark?

阅读更多关于 Cumulative product in Spark?

问题 I try to implement a Cumululative product in Spark scala but I really don't know how to it. I have the following dataframe: Input data: +--+--+--------+----+ |A |B | date | val| +--+--+--------+----+ |rr|gg|20171103| 2 | |hh|jj|20171103| 3 | |rr|gg|20171104| 4 | |hh|jj|20171104| 5 | |rr|gg|20171105| 6 | |hh|jj|20171105| 7 | +-------+------+----+ And I would like to have the following output Output data: +--+--+--------+-----+ |A |B | date | val | +--+--+--------+-----+ |rr|gg|20171105| 48 | /

how to return the count of unique documents by using elasticsearch aggregation

阅读更多关于 how to return the count of unique documents by using elasticsearch aggregation

问题 I encountered a problem that elasticsearch could not return the count of unique documents by just using terms aggregation on a nested field. Here is an example of our model: { ..., "location" : [ {"city" : "new york", "state" : "ny"}, {"city" : "woodbury", "state" : "ny"}, ... ], ... } I want to do aggregation on the state field, but this document will be counted twice in the 'ny' bucket since 'ny' appears twice in the document. So I'm wondering if where is a way to grab the count of distinct

Aggregate bitwise-OR in a subquery

阅读更多关于 Aggregate bitwise-OR in a subquery

问题 Given the following table: CREATE TABLE BitValues ( n int ) Is it possible to compute the bitwise-OR of n for all rows within a subquery ? For example, if BitValues contains these 4 rows: +---+ | n | +---+ | 1 | | 2 | | 4 | | 3 | +---+ I would expect the subquery to return 7. Is there a way to do this inline, without creating a UDF ? 回答1: WITH Bits AS ( SELECT 1 AS BitMask UNION ALL SELECT 2 UNION ALL SELECT 4 UNION ALL SELECT 8 UNION ALL SELECT 16 ) SELECT SUM(DISTINCT BitMask) FROM ( SELECT

Find Most Common Value and Corresponding Count Using Spark Groupby Aggregates

阅读更多关于 Find Most Common Value and Corresponding Count Using Spark Groupby Aggregates

问题 I am trying to use Spark (Scala) dataframes to do groupby aggregates for mode and the corresponding count. For example, Suppose we have the following dataframe: Category Color Number Letter 1 Red 4 A 1 Yellow Null B 3 Green 8 C 2 Blue Null A 1 Green 9 A 3 Green 8 B 3 Yellow Null C 2 Blue 9 B 3 Blue 8 B 1 Blue Null Null 1 Red 7 C 2 Green Null C 1 Yellow 7 Null 3 Red Null B Now we want to group by Category, then Color, and then find the size of the grouping, count of number non-nulls, the total

Elasticsearch Aggregation Broken after upgrade to 1.7.3

阅读更多关于 Elasticsearch Aggregation Broken after upgrade to 1.7.3

问题 It was working before the upgrade to 1.7.3 but now it is telling me my "Data too large for [Gender]. I ran the curl -XGET localhost:9200/_nodes/stats/indices/fielddata?fields=* and it produced { {"fielddata":{"memory_size_in_bytes":642066528,"evictions":0, "fields":{"Markers":{"memory_size_in_bytes":196538816}, "RegistrationDate":{"memory_size_in_bytes":101759288}, "Abbreviation":{"memory_size_in_bytes":185815224}, "Gender":{"memory_size_in_bytes":52988320}, "Birthdate":{"memory_size_in_bytes

data.table: cumulative values by grouping variable [duplicate]

阅读更多关于 data.table: cumulative values by grouping variable [duplicate]

问题 This question already has answers here : Calculate cumulative sum within each ID (group) (4 answers) Closed last year . I have data set.seed(42) dat <- data.table(id=1:8, group=c(1,1,2,2,2,3,3,3), val=rnorm(8)) > dat id group val 1: 1 1 1.37095845 2: 2 1 -0.56469817 3: 3 2 0.36312841 4: 4 2 0.63286260 5: 5 2 0.40426832 6: 6 3 -0.10612452 7: 7 3 1.51152200 8: 8 3 -0.09465904 and I would like to obtain the cumulative values of val within each level of group . > res id group cum 1: 1 1 1

How to calculate difference between metrics in different aggregations in elasticsearch

阅读更多关于 How to calculate difference between metrics in different aggregations in elasticsearch

问题 I want to calculate the difference of nested aggregations between two dates. To be more concrete is it possible to calculate the difference between date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value given the below request/response. Is that possible with elasticsearch v.1.0.1? The aggregation query request looks like this: { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "bool": { "must": [ { "terms":