aggregate-functions | 易学教程

how to find the pathing flow and rank them using pig or hive?

阅读更多关于 how to find the pathing flow and rank them using pig or hive?

问题 Below is the example for my use case. 回答1: You can reference this question where an OP was asking something similar. If I am understanding your problem correctly, you want to remove duplicates from the path, but only when they occur next to each other. So 1 -> 1 -> 2 -> 1 would become 1 -> 2 -> 1 . If this is correct, then you can't just group and distinct (as I'm sure you have noticed) because it will remove all duplicates. An easy solution is to write a UDF to remove those duplicates while

SQL SUM function without grouping data

阅读更多关于 SQL SUM function without grouping data

问题 I need to do a sum of a variable broken by certain other variables. I normally would do this with the group by function. However in this case, I do not want to roll-up the data. I want to keep the original data with some sort of aggregated sum . -ID-- --amount-- 1 23 1 11 1 8 1 7 2 10 2 20 2 15 2 10 Result -ID-- --amount-----SUM 1 23 49 1 11 49 1 8 49 1 7 49 2 10 55 2 20 55 2 15 55 2 10 55 回答1: You could use a subquery to get the total for each id and join that back to your table: select t.id

ROW_NUMBER() shows unexpected values

阅读更多关于 ROW_NUMBER() shows unexpected values

问题 My table has values like ( RowCount is generated by the query below): ID Date_trans Time_trans Price RowCount ------- ----------- ---------- ----- -------- 1699093 22-Feb-2011 09:30:00 58.07 1 1699094 22-Feb-2011 09:30:00 58.08 1 1699095 22-Feb-2011 09:30:00 58.08 2 1699096 22-Feb-2011 09:30:00 58.08 3 1699097 22-Feb-2011 09:30:00 58.13 1 1699098 22-Feb-2011 09:30:00 58.13 2 1699099 22-Feb-2011 09:30:00 58.12 1 1699100 22-Feb-2011 09:30:08 58.13 3 1699101 22-Feb-2011 09:30:09 57.96 1 1699102

Postgres aggregrate function for calculating vector average of wind speed (vector magnitude) and wind direction (vector direction)

阅读更多关于 Postgres aggregrate function for calculating vector average of wind speed (vector magnitude) and wind direction (vector direction)

问题 I have a table with two columns wind_speed and wind_direction . I want to have a custom aggregrate function that would return average wind_speed and wind_direction . wind_speed and wind_direction in combination defines a vector where wind_speed is the magnitude of the vector and wind_direction is the direction of the vector. avg_wind_direction function should return average wind_speed as magnitude and wind_direction as direction of the average vector. SELECT avg_wind_direction(wind_speed,

Problem when grouping

阅读更多关于 Problem when grouping

问题 I have this MySql query : SELECT forum_categories.title, forum_messages.author, forum_messages.date AS last_message FROM forum_categories JOIN forum_topics ON forum_topics.category_id=forum_categories.id JOIN forum_messages ON forum_messages.topic_id=forum_topics.id WHERE forum_categories.id=6 ORDER BY forum_categories.date ASC And the output is the follow : Welcome daniel 2010-07-09 22:14:49 Welcome daniel 2010-06-29 22:14:49 Welcome luke 2010-08-10 20:12:20 Welcome skywalker 2010-08-19 22

Get count of created entries for each day

阅读更多关于 Get count of created entries for each day

问题 Let's say I have a this search query like this: SELECT COUNT(id), date(created_at) FROM entries WHERE date(created_at) >= date(current_date - interval '1 week') GROUP BY date(created_at) As you know then for example I get a result back like this: count | date 2 | 15.01.2014 1 | 13.01.2014 9 | 09.01.2014 But I do not get the days of the week where no entries where created. How can I get a search result that looks like this, including the days where no entries where created? count | date 2 | 15

Postgres LEFT JOIN with SUM, missing records

阅读更多关于 Postgres LEFT JOIN with SUM, missing records

问题 I am trying to get the count of certain types of records in a related table. I am using a left join. So I have a query that isn't quite right and one that is returning the correct results. The correct results query has a higher execution cost. Id like to use the first approach, if I can correct the results. (see http://sqlfiddle.com/#!15/7c20b/5/2) CREATE TABLE people( id SERIAL, name varchar not null ); CREATE TABLE pets( id SERIAL, name varchar not null, kind varchar not null, alive boolean

Custom aggregation on PySpark dataframes

阅读更多关于 Custom aggregation on PySpark dataframes

问题 I have a PySpark DataFrame with one column as one hot encoded vectors. I want to aggregate the different one hot encoded vectors by vector addition after groupby e.g. df[userid,action] Row1: ["1234","[1,0,0]] Row2: ["1234", [0 1 0]] I want the output as row: ["1234", [ 1 1 0]] so the vector is a sum of all vectors grouped by userid . How can I achieve this? PySpark sum aggregate operation does not support the vector addition. 回答1: You have several options: Create a user defined aggregate

Multiple SUM using LINQ

阅读更多关于 Multiple SUM using LINQ

问题 I have a loop like the following, can I do the same using multiple SUM? foreach (var detail in ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload && pd.InventoryType == InventoryTypes.Finished)) { weight += detail.GrossWeight; length += detail.Length; items += detail.NrDistaff; } 回答1: Technically speaking, what you have is probably the most efficient way to do what you are asking. However, you could create an extension method on IEnumerable<T> called Each that

Rounding of the numeric values

阅读更多关于 Rounding of the numeric values

问题 I want to round of the values of two columns: select a.region as "Regions", a.suminsured,2 as "SumInsured" , a.suminsured/b.sum*100 as pct from ( SELECT region, sum(suminsured) as suminsured FROM "Exposure_commune" group by region ) a, (select sum(suminsured) FROM "Exposure_commune") b I want the suminsured and pct columns to come with 2 decimal places. Can someone tell me what I should do? 回答1: Use round() with two parameters, which only works for the data type numeric. While being at it,