aggregation

How to compute the sum of orders over a 12 months period sliding by 1 month per customer in Spark

ぐ巨炮叔叔 提交于 2019-12-07 10:23:34
问题 I am relatively new to spark with Scala. currently I am trying to aggregate order data in spark over a 12 months period that slides monthly. Below is a simple sample of my data, I tried to format it so you can easily test it import spark.implicits._ import org.apache.spark.sql._ import org.apache.spark.sql.functions._ var sample = Seq(("C1","01/01/2016", 20), ("C1","02/01/2016", 5), ("C1","03/01/2016", 2), ("C1","04/01/2016", 3), ("C1","05/01/2017", 5), ("C1","08/01/2017", 5), ("C1","01/02

UML - association or aggregation (simple code snippets)

孤街浪徒 提交于 2019-12-07 10:01:20
问题 I drives me crazy how many books contradicts themselves. Class A {} class B {void UseA(A a)} //some say this is an association, no reference is held but communication is possible Class A {} class B {A a;} //some say this is aggregration, a reference is held But many say that holding a reference is still just an association and for aggregation they use a list - IMHO this is the same, it it still a reference. I am very confused, I would like to understand the problem. E.g. here: http:/

How to use Addfields in MongoDB C# Aggregation Pipeline

无人久伴 提交于 2019-12-07 09:16:17
问题 Mongo DB's Aggregation pipeline has an "AddFields" stage that allows you to project new fields to the pipeline's output document without knowing what fields already existed. It seems this has not been included in the C# driver for Mongo DB (using version 2.7). Does anyone know if there are any alternatives to this? Maybe a flag on the "Project" stage? 回答1: As discussed here Using $addFields in MongoDB Driver for C# you can build the aggregation stage yourself with a BsonDocument. To use the

ElasticSearch 5.1 Fielddata is disabled in text field by default [ERROR: trying to use aggregation on field]

半世苍凉 提交于 2019-12-07 00:57:31
问题 Having this field in my mapping "answer": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, i try to execute this aggregation "aggs": { "answer": { "terms": { "field": "answer" } }, but i get this error "type": "illegal_argument_exception", "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [answer] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory." Do

Spotfire - Finding Percentage of Subtotals

我是研究僧i 提交于 2019-12-06 14:42:10
I'm trying to turn a cross table that looks like this into a table which shows the subtotals and percentage over each Group like the example below Where the percentage is the sales of each product divided by the total sales in each group, so for Product A = 20 / (20+40+30) = 22% So far, I've managed to use Spotfire built-in subtotal function and the following expression to almost achieved the table I want Sum([Sales) / Sum([Sales]) OVER (Intersect(Parent([Axis.Rows]),All([Axis.Rows]))) but the only problem is that the percentage for my subtotal row doesn't seems to equal to 100%, instead it is

Python Pandas: Groupby and Apply multi-column operation

时光总嘲笑我的痴心妄想 提交于 2019-12-06 14:30:55
问题 df1 is DataFrame with 4 columns. I want to created a new DataFrame (df2) by grouping df1 with Column 'A' with multi-column operation on column 'C' and 'D' Column 'AA' = mean(C)+mean(D) Column 'BB' = std(D) df1= pd.DataFrame({ 'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C' : np.random.randn(8), 'D' : np.random.randn(8)}) A B C D 0 foo one 1.652675 -1.983378 1 bar one 0.926656 -0.598756 2 foo two 0.131381 0

Elasticsearch Pipelining through a Child Aggregation

大憨熊 提交于 2019-12-06 14:12:41
I am trying to Sum up Data through a Child Aggregation in Elasticsearch 2.1. With Pipelining i am trying to get the Child Aggregation Data summed up on the Parent Level of the Aggregation: { "query": { "match_all": {} }, "aggs": { "unit": { "terms": { "size": 500, "field": "unit_id" }, "aggs": { "total_active_ministers_by_unit": { "sum_bucket": { "buckets_path": "ministers>active_minister_by_ministry.value" } }, "ministers": { "children": { "type": "member_ministry" }, "aggs": { "active_minister_by_ministry": { "sum_bucket": { "buckets_path": "ministry>active_minister._count" } }, "ministry":

Row Aggregation after Cross Join in BigQuery

末鹿安然 提交于 2019-12-06 11:12:31
问题 Say you have the following table in BigQuery: A = user1 | 0 0 | user2 | 0 3 | user3 | 4 0 | After a cross join, you have dist = |user1 user2 0 0 , 0 3 | #comma is just showing user val seperation |user1 user3 0 0 , 4 0 | |user2 user3 0 3 , 4 0 | How can you perform row aggregation in BigQuery to compute pairwise aggregation across rows. As a typical use case, you could compute the euclidean distance between the two users. I want to compute the following metric between the two users: sum(min

Producing histogram Map for IntStream raises compile-time-error

扶醉桌前 提交于 2019-12-06 10:24:41
问题 I'm interested in building a Huffman Coding prototype. To that end, I want to begin by producing a histogram of the characters that make up an input Java String . I've seen many solutions on SO and elsewhere (e.g:here that depend on using the collect() methods for Stream s as well as static imports of Function.identity() and Collectors.counting() in a very specific and intuitive way. However, when using a piece of code eerily similar to the one I linked to above: private List<HuffmanTrieNode>

Sumproduct using Django's aggregation

你离开我真会死。 提交于 2019-12-06 08:53:11
问题 Question Is it possible using Django's aggregation capabilities to calculate a sumproduct? Background I am modeling an invoice, which can contain multiple items. The many-to-many relationship between the Invoice and Item models is handled through the InvoiceItem intermediary table. The total amount of the invoice— amount_invoiced —is calculated by summing the product of unit_price and quantity for each item on a given invoice. Below is the code that I'm currently using to accomplish this, but