aggregation

Spark aggregate on multiple columns within partition without shuffle

强颜欢笑 提交于 2019-12-06 07:02:59
问题 I'm trying to aggregate a dataframe on multiple columns. I know that everything I need for the aggregation is within the partition- that is, there's no need for a shuffle because all of the data for the aggregation are local to the partition. Taking an example, if I have something like val sales=sc.parallelize(List( ("West", "Apple", 2.0, 10), ("West", "Apple", 3.0, 15), ("West", "Orange", 5.0, 15), ("South", "Orange", 3.0, 9), ("South", "Orange", 6.0, 18), ("East", "Milk", 5.0, 5)))

Elasticsearch - group by day of week and hour

强颜欢笑 提交于 2019-12-06 03:41:39
问题 I need to do get some data grouped by day of week and hour, for example curl -XGET http://localhost:9200/testing/hello/_search?pretty=true -d ' { "size": 0, "aggs": { "articles_over_time" : { "date_histogram" : { "field" : "date", "interval" : "hour", "format": "E - k" } } } } ' Gives me this: { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2857, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "articles_over_time" : {

Aggregating array of values in elasticsearch

橙三吉。 提交于 2019-12-06 02:35:16
问题 I need to aggregate an array as follows Two document examples: { "_index": "log", "_type": "travels", "_id": "tnQsGy4lS0K6uT3Hwzzo-g", "_score": 1, "_source": { "state": "saopaulo", "date": "2014-10-30T17", "traveler": "patrick", "registry": "123123", "cities": { "saopaulo": 1, "riodejaneiro": 2, "total": 2 }, "reasons": [ "Entrega de encomenda" ], "from": [ "CompraRapida" ] } }, { "_index": "log", "_type": "travels", "_id": "tnQsGy4lS0K6uT3Hwzzo-g", "_score": 1, "_source": { "state":

How to release Maven multi-module project with inter-project dependencies?

杀马特。学长 韩版系。学妹 提交于 2019-12-06 00:43:19
问题 Lets say we have 3 layers project. DB, Business, Web and aggregating pom. Project |-DB | |-pom.xml |-Business | |-pom.xml |-pom.xml All modules are ment to be released and branched together, so Aggregator pom is configured to assign the same version to all submodules. We have the following versions: DB-0.1-SNAPSHOT Business-0.1-SNAPSHOT which depends on DB-0.1-SNAPSHOT Web-0.1-SNAPSHOT which depends on Business-0.1-SNAPSHOT When doing release:prepare , all versions updated to 0.1, but prepare

SAPUI5 routing - Control with ID idAppControl could not be found

瘦欲@ 提交于 2019-12-05 21:42:34
first at all, I´m aware that there were similiar asked questions, but none of the answers could solve my problem. Short look into my code: My Component.js looks like this routes: [ { pattern: "", //home page name: util.Constants.Tile, view: util.Constants.Tile, viewId: util.Constants.Tile, targetAggregation: "pages" //targetControl: "idAppControl" }, { pattern: "firstExample", name: util.Constants.FirstExample, view: util.Constants.FirstExample, viewId: util.Constants.FirstExample, targetAggregation: "pages", targetControl : "idAppControl", subroutes : [ { pattern: "firstExample", name: util

elasticsearch aggregation to sort by ratio of aggregations

两盒软妹~` 提交于 2019-12-05 21:32:36
I have a scenario in analytics where I want to calculate least performing 20 outlets out of 1000+ outlets where performance = transactionCount/ VisitCount per month at an outlet. Mappings are, { "CustomerVisit": { "properties": { "outlet": { "type": "string", "index": "not_analyzed" }, "customerName": { "type": "string", "index": "not_analyzed" }, "visitMonth": { "type": "Date" }, "visit": { "type": "nested", "properties": { "visitStatus": { "type": "long" }, "transactionStatus": { "type": "long" }, "remarks": { "type": "string", "index": "not_analyzed" } } } } } } Now, What I want is

MongoDB C# Aggregation with LINQ

烂漫一生 提交于 2019-12-05 18:25:12
I have a mongo object with these fields: DateTime TimeStamp; float Value; How can I get the aggregation pipeline, in C#, with LINQ, to get the minimum, maximum and average of Value over a specific timestamp range? I have seen a few aggregation examples, but I don't quite get it. Having an example on a simple case like this would certainly (hopefully) make me understand it. You can use LINQ syntax which gets translated into Aggregation Framework's syntax. Assuming you have following Model class: public class Model { public DateTime Timestamp { get; set; } public float Value { get; set; } } you

How to use Addfields in MongoDB C# Aggregation Pipeline

那年仲夏 提交于 2019-12-05 18:13:54
Mongo DB's Aggregation pipeline has an "AddFields" stage that allows you to project new fields to the pipeline's output document without knowing what fields already existed. It seems this has not been included in the C# driver for Mongo DB (using version 2.7). Does anyone know if there are any alternatives to this? Maybe a flag on the "Project" stage? As discussed here Using $addFields in MongoDB Driver for C# you can build the aggregation stage yourself with a BsonDocument. To use the example from https://docs.mongodb.com/manual/reference/operator/aggregation/addFields/ { $addFields: {

UML - association or aggregation (simple code snippets)

久未见 提交于 2019-12-05 16:58:47
I drives me crazy how many books contradicts themselves. Class A {} class B {void UseA(A a)} //some say this is an association, no reference is held but communication is possible Class A {} class B {A a;} //some say this is aggregration, a reference is held But many say that holding a reference is still just an association and for aggregation they use a list - IMHO this is the same, it it still a reference. I am very confused, I would like to understand the problem. E.g. here: http://aviadezra.blogspot.cz/2009/05/uml-association-aggregation-composition.html - what is the difference between

How to compute the sum of orders over a 12 months period sliding by 1 month per customer in Spark

狂风中的少年 提交于 2019-12-05 15:48:03
I am relatively new to spark with Scala. currently I am trying to aggregate order data in spark over a 12 months period that slides monthly. Below is a simple sample of my data, I tried to format it so you can easily test it import spark.implicits._ import org.apache.spark.sql._ import org.apache.spark.sql.functions._ var sample = Seq(("C1","01/01/2016", 20), ("C1","02/01/2016", 5), ("C1","03/01/2016", 2), ("C1","04/01/2016", 3), ("C1","05/01/2017", 5), ("C1","08/01/2017", 5), ("C1","01/02/2017", 10), ("C1","01/02/2017", 10), ("C1","01/03/2017", 10)).toDF("id","order_date", "orders") sample =