aggregate-functions

how to find the pathing flow and rank them using pig or hive?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-14 03:25:36
问题 Below is the example for my use case. 回答1: You can reference this question where an OP was asking something similar. If I am understanding your problem correctly, you want to remove duplicates from the path, but only when they occur next to each other. So 1 -> 1 -> 2 -> 1 would become 1 -> 2 -> 1 . If this is correct, then you can't just group and distinct (as I'm sure you have noticed) because it will remove all duplicates. An easy solution is to write a UDF to remove those duplicates while

SQL SUM function without grouping data

£可爱£侵袭症+ 提交于 2019-12-14 02:47:40
问题 I need to do a sum of a variable broken by certain other variables. I normally would do this with the group by function. However in this case, I do not want to roll-up the data. I want to keep the original data with some sort of aggregated sum . -ID-- --amount-- 1 23 1 11 1 8 1 7 2 10 2 20 2 15 2 10 Result -ID-- --amount-----SUM 1 23 49 1 11 49 1 8 49 1 7 49 2 10 55 2 20 55 2 15 55 2 10 55 回答1: You could use a subquery to get the total for each id and join that back to your table: select t.id

ROW_NUMBER() shows unexpected values

情到浓时终转凉″ 提交于 2019-12-14 02:36:07
问题 My table has values like ( RowCount is generated by the query below): ID Date_trans Time_trans Price RowCount ------- ----------- ---------- ----- -------- 1699093 22-Feb-2011 09:30:00 58.07 1 1699094 22-Feb-2011 09:30:00 58.08 1 1699095 22-Feb-2011 09:30:00 58.08 2 1699096 22-Feb-2011 09:30:00 58.08 3 1699097 22-Feb-2011 09:30:00 58.13 1 1699098 22-Feb-2011 09:30:00 58.13 2 1699099 22-Feb-2011 09:30:00 58.12 1 1699100 22-Feb-2011 09:30:08 58.13 3 1699101 22-Feb-2011 09:30:09 57.96 1 1699102

Postgres aggregrate function for calculating vector average of wind speed (vector magnitude) and wind direction (vector direction)

六眼飞鱼酱① 提交于 2019-12-14 01:36:38
问题 I have a table with two columns wind_speed and wind_direction . I want to have a custom aggregrate function that would return average wind_speed and wind_direction . wind_speed and wind_direction in combination defines a vector where wind_speed is the magnitude of the vector and wind_direction is the direction of the vector. avg_wind_direction function should return average wind_speed as magnitude and wind_direction as direction of the average vector. SELECT avg_wind_direction(wind_speed,

Problem when grouping

穿精又带淫゛_ 提交于 2019-12-13 17:53:15
问题 I have this MySql query : SELECT forum_categories.title, forum_messages.author, forum_messages.date AS last_message FROM forum_categories JOIN forum_topics ON forum_topics.category_id=forum_categories.id JOIN forum_messages ON forum_messages.topic_id=forum_topics.id WHERE forum_categories.id=6 ORDER BY forum_categories.date ASC And the output is the follow : Welcome daniel 2010-07-09 22:14:49 Welcome daniel 2010-06-29 22:14:49 Welcome luke 2010-08-10 20:12:20 Welcome skywalker 2010-08-19 22

Get count of created entries for each day

杀马特。学长 韩版系。学妹 提交于 2019-12-13 14:48:21
问题 Let's say I have a this search query like this: SELECT COUNT(id), date(created_at) FROM entries WHERE date(created_at) >= date(current_date - interval '1 week') GROUP BY date(created_at) As you know then for example I get a result back like this: count | date 2 | 15.01.2014 1 | 13.01.2014 9 | 09.01.2014 But I do not get the days of the week where no entries where created. How can I get a search result that looks like this, including the days where no entries where created? count | date 2 | 15

Postgres LEFT JOIN with SUM, missing records

倾然丶 夕夏残阳落幕 提交于 2019-12-13 13:32:02
问题 I am trying to get the count of certain types of records in a related table. I am using a left join. So I have a query that isn't quite right and one that is returning the correct results. The correct results query has a higher execution cost. Id like to use the first approach, if I can correct the results. (see http://sqlfiddle.com/#!15/7c20b/5/2) CREATE TABLE people( id SERIAL, name varchar not null ); CREATE TABLE pets( id SERIAL, name varchar not null, kind varchar not null, alive boolean

Custom aggregation on PySpark dataframes

折月煮酒 提交于 2019-12-13 12:07:54
问题 I have a PySpark DataFrame with one column as one hot encoded vectors. I want to aggregate the different one hot encoded vectors by vector addition after groupby e.g. df[userid,action] Row1: ["1234","[1,0,0]] Row2: ["1234", [0 1 0]] I want the output as row: ["1234", [ 1 1 0]] so the vector is a sum of all vectors grouped by userid . How can I achieve this? PySpark sum aggregate operation does not support the vector addition. 回答1: You have several options: Create a user defined aggregate

Multiple SUM using LINQ

拟墨画扇 提交于 2019-12-13 11:58:09
问题 I have a loop like the following, can I do the same using multiple SUM? foreach (var detail in ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload && pd.InventoryType == InventoryTypes.Finished)) { weight += detail.GrossWeight; length += detail.Length; items += detail.NrDistaff; } 回答1: Technically speaking, what you have is probably the most efficient way to do what you are asking. However, you could create an extension method on IEnumerable<T> called Each that

Rounding of the numeric values

梦想的初衷 提交于 2019-12-13 09:45:29
问题 I want to round of the values of two columns: select a.region as "Regions", a.suminsured,2 as "SumInsured" , a.suminsured/b.sum*100 as pct from ( SELECT region, sum(suminsured) as suminsured FROM "Exposure_commune" group by region ) a, (select sum(suminsured) FROM "Exposure_commune") b I want the suminsured and pct columns to come with 2 decimal places. Can someone tell me what I should do? 回答1: Use round() with two parameters, which only works for the data type numeric. While being at it,