aggregate

Enrich a Kafka Stream with data from KTables

旧城冷巷雨未停 提交于 2021-02-11 15:01:35
问题 I currently maintain a financial application. While there are many calculations done in this financial application, one of the calculations is to determine 1) How much percentage of the total transaction amount does a new incoming transaction account for? 2) How much percentage of the total transaction amount for the given customer does the new transaction account for with respect to the same customer? For the sake of simplicity, let's assume that the transcation data will be cut off at 6 am

R: Using aggregate within a function, not working

你。 提交于 2021-02-11 12:28:17
问题 I am fairly new to R so apologies if an answer to this already exists that I am unable to find. I cannot replicate the exact error I have with my own dataset, but since an error is produced nonetheless here we go. What I am trying to do is to create a function to calculate the average of several columns conditionnal on the values of other ones. Let's say that d1 <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4) d2 <- c(1:12) d3 <- c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2) df <- cbind(d1, d2, d3)

Percentage of factor levels by group in R [duplicate]

我们两清 提交于 2021-02-08 10:25:10
问题 This question already has answers here : Relative frequencies / proportions with dplyr (9 answers) Extend contigency table with proportions (percentages) (6 answers) Closed 7 months ago . I am trying to calculate the percentage of different levels of a factor within a group. I have nested data and would like to see the percentage of schools in each country is a private schools (factor with 2 levels). However, I cannot figure out how to do that. # my data: CNT <- c("A", "A", "A", "A", "A", "B"

Pandas Multiindex Groupby aggregate column with value from another column

邮差的信 提交于 2021-02-08 08:41:13
问题 I have a pandas dataframe with multiindex where I want to aggregate the duplicate key rows as follows: import numpy as np import pandas as pd df = pd.DataFrame({'S':[0,5,0,5,0,3,5,0],'Q':[6,4,10,6,2,5,17,4],'A': ['A1','A1','A1','A1','A2','A2','A2','A2'], 'B':['B1','B1','B2','B2','B1','B1','B1','B2']}) df.set_index(['A','B']) Q S A B A1 B1 6 0 B1 4 5 B2 10 0 B2 6 5 A2 B1 2 0 B1 5 3 B1 17 5 B2 4 0 and I would like to groupby this dataframe to aggregate the Q values (sum) and keep the S value

Aggregating more than two properties Java 8

泪湿孤枕 提交于 2021-02-07 10:59:34
问题 To be very simple I have class Per{ int a; long b; double c; String d; } Let say I have 3000 Object of Type Per and collected in a List<Per> pers Now I want to achieve:- Skip if object is null or d is null or blank sum of a sum of b aggregated value of operation performed on c Old way is int totalA = 0; long totalB = 0l; long totalC = 0l; for (Per per : pers) { if (per.d != null && !per.d.trim().equals("")) { totalA += per.a; totalB += per.b; totalC += someOperation(per.c); } } someOperation

'Could not interpret input' error with Seaborn when plotting groupbys

眉间皱痕 提交于 2021-02-07 05:20:15
问题 Say I have this dataframe d = { 'Path' : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'], 'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'], 'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'], 'Value' : [30, 20, 10, 40, 40, 50], 'Field' : [50, 70, 10, 20, 30, 30] } df = DataFrame(d) df.set_index(['Path', 'Detail'], inplace=True) df Field Program Value Path Detail abc foo 50 prog1 30 bar 70 prog1 20 ghi bar 10 prog1 10 foo 20 prog2 40 jkl foo 30 prog3 40 foo 30 prog3 50 I can aggregate it

'Could not interpret input' error with Seaborn when plotting groupbys

对着背影说爱祢 提交于 2021-02-07 05:19:19
问题 Say I have this dataframe d = { 'Path' : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'], 'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'], 'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'], 'Value' : [30, 20, 10, 40, 40, 50], 'Field' : [50, 70, 10, 20, 30, 30] } df = DataFrame(d) df.set_index(['Path', 'Detail'], inplace=True) df Field Program Value Path Detail abc foo 50 prog1 30 bar 70 prog1 20 ghi bar 10 prog1 10 foo 20 prog2 40 jkl foo 30 prog3 40 foo 30 prog3 50 I can aggregate it

Partition data around a match query during aggregation

你。 提交于 2021-02-05 11:18:30
问题 What I have been trying to get my head around is to perform some kind of partitioning(split by predicate) in a mongo query. My current query looks like: db.posts.aggregate([ {"$match": { $and:[ {$or:[{"toggled":false},{"toggled":true, "status":"INACTIVE"}]} , {"updatedAt":{$gte:1549786260000}} ] }}, {"$unwind" :"$interests"}, {"$group" : {"_id": {"iid": "$interests", "pid":"$publisher"}, "count": {"$sum" : 1}}}, {"$project":{ _id: 0, "iid": "$_id.iid", "pid": "$_id.pid", "count": 1 }} ]) This

MongoDB Orders/sales aggregation group Per Month Sum Total + Count Field

戏子无情 提交于 2021-02-05 06:48:25
问题 Who knows a better solution to group Orders by date and sum total and count by source. Of course I can group by Source and then I get only totals for this source only, I can alter the result thereafter to get the desired result. But I would like to know if it is possible in one simple $group statement. Eg. ordersByApp = 1, ordersByWEB = 2 Orders collection { _id: 'XCUZO0', date: "2020-02-01T00:00:03.243Z" total: 9.99, source: 'APP' }, { _id: 'XCUZO1', date: "2020-01-05T00:00:03.243Z" total: 9

MongoDB Orders/sales aggregation group Per Month Sum Total + Count Field

≯℡__Kan透↙ 提交于 2021-02-05 06:47:29
问题 Who knows a better solution to group Orders by date and sum total and count by source. Of course I can group by Source and then I get only totals for this source only, I can alter the result thereafter to get the desired result. But I would like to know if it is possible in one simple $group statement. Eg. ordersByApp = 1, ordersByWEB = 2 Orders collection { _id: 'XCUZO0', date: "2020-02-01T00:00:03.243Z" total: 9.99, source: 'APP' }, { _id: 'XCUZO1', date: "2020-01-05T00:00:03.243Z" total: 9