group-by | 易学教程

Conditional filter of grouped factors - dplyr

阅读更多关于 Conditional filter of grouped factors - dplyr

问题 Say I have this sort of dataframe: day value group type id 1 1 0.1 A X 1 2 1 0.4 A Y 1 3 2 0.2 A X 3 4 2 0.5 A Y 3 5 3 0.3 A X 5 6 3 0.2 A Y 6 7 1 0.1 B X 3 8 1 0.3 B Y 3 9 2 0.1 B X 11 10 2 0.4 B Y 10 11 3 0.2 B X 12 12 3 0.3 B Y 12 13 1 0.1 C X 12 14 1 0.3 C Y 12 15 2 0.3 C X 5 16 2 0.2 C Y 5 17 3 0.2 C X 3 18 3 0.2 C Y 2 Data: library(dplyr) df1 <- data.frame( day = rep(1:3,6), value = c(0.1,0.2,0.3,0.4,0.5,0.2,0.1,0.1,0.2,0.3,0.4,0.3, 0.1,0.3,0.2,0.3,0.2,0.2), group = rep(LETTERS[1:3],

Update Statement using Join and Group By

阅读更多关于 Update Statement using Join and Group By

问题 I have written the below Update Statement, but it shows the error such as "Incorrect syntax near the keyword 'GROUP'." UPDATE J SET J.StatusID = CASE WHEN SUM(DUV.VendorDUQuantity) = SUM(RD.InvoiceQuantity) THEN 1 ELSE J.StatusID END FROM PLN_DU_Vendor DUV INNER JOIN ENG_Release R ON R.ReleaseID = DUV.ReleaseID INNER JOIN ENG_DU_Header H ON H.ReleaseID = R.ReleaseID AND DUV.DUID = H.DUID INNER JOIN MKT_JobOrder J ON J.JobOrderID = R.JobOrderID INNER JOIN MKT_CustomerOrder CO ON CO.OrderID = J

Find Users who worked for 5 consecutive days with date-range in output

阅读更多关于 Find Users who worked for 5 consecutive days with date-range in output

问题 I have a table has data similar to below Emp Date Code --- -------- ---- E1 11/1/2012 W E1 11/1/2012 V E2 11/1/2012 W E1 11/2/2012 W E1 11/3/2012 W E1 11/4/2012 W E1 11/5/2012 W I want to get list of employees between a date range(say for the last 3 months) who worked for code W conescutively for 5 days with the date range in the output. Each employee can have multiple records for a single day with different codes. Expected Output is Emp Date-Range --- ---------- E1 11/1 -11/5 Below is what I

Spark Flatten Seq by reversing groupby, (i.e. repeat header for each sequence in it)

阅读更多关于 Spark Flatten Seq by reversing groupby, (i.e. repeat header for each sequence in it)

问题 We have an RDD with the following form: org.apache.spark.rdd.RDD[((BigInt, String), Seq[(BigInt, Int)])] What we would like to do is flatten that into a single list of tab delimited strings to save with saveAsText file. And by flatten, I mean repeat the groupby tuple (BigInt, String) for each item in its Seq. So the data that looks like.. ((x1,x2), ((y1.1,y1.2), (y2.1, y2.2) .... )) ... Will wind up looking like x1 x2 y1.1 y1.2 x1 x2 y2.1 y2.2 So far the code I've tried mostly flattens it all

SQLite3 Simulate RIGHT OUTER JOIN with LEFT JOINs and UNION

阅读更多关于 SQLite3 Simulate RIGHT OUTER JOIN with LEFT JOINs and UNION

问题 I have the following select statement where I need to sum each task from table tbTasks and group them by projectId from table tbProjects in order to get a record like this: ProjectID = 1, ProjectName = 'My Project', TotalTime = 300 //<--sum of each task time The query looks like this: SELECT tbTasks.projectId, SUM(tbTasks.taskTime) AS totalTime, tbProjects.projectName FROM tbTasks INNER JOIN tbProjects ON tbTasks.projectId = tbProjects.projectId GROUP BY tbTasks.projectId ORDER BY tbProjects

GROUP_CONCAT return NULL if any value is NULL

阅读更多关于 GROUP_CONCAT return NULL if any value is NULL

问题 How can I make GROUPT_CONCAT return NULL if any column is NULL ? Here is a test table: CREATE TABLE gc ( a INT(11) NOT NULL, b VARCHAR(1) DEFAULT NULL ); INSERT INTO gc (a, b) VALUES (1, 'a'), (1, 'b'), (2, 'c'), (2, NULL), (3, 'e'); And my query: SELECT a, GROUP_CONCAT(b) FROM gc GROUP BY a; This is what I get: a | GROUP_CONCAT(b) --+---------------- 1 | a,b 2 | c 3 | e This is what I want: a | GROUP_CONCAT(b) --+---------------- 1 | a,b 2 | NULL 3 | e 回答1: In an IF expression check if any

SQL vs MySQL: Rules about aggregate operations and GROUP BY

阅读更多关于 SQL vs MySQL: Rules about aggregate operations and GROUP BY

问题 In this book I'm currently reading while following a course on databases, the following example of an illegal query using an aggregate operator is given: Find the name and age of the oldest sailor. Consider the following attempt to answer this query: SELECT S.sname, MAX(S.age) FROM Sailors S The intent is for this query to return not only the maximum age but also the name of the sailors having that age. However, this query is illegal in SQL--if the SELECT clause uses an aggregate operation,

Get MAX from a GROUP BY

阅读更多关于 Get MAX from a GROUP BY

问题 I was practicing some SQL when this hit me. I wanted to see how many times a certain commodity came up and from there get the commodity which came up the most . This shows how many times each commodity comes up: mysql> SELECT commodity, COUNT(commodity) count FROM orders GROUP BY commodity ORDER BY count; +----------------------+------------+ | commodity | count | +----------------------+------------+ | PERSIAN MELON | 4 | | BEANS | 6 | | CASABA | 10 | | ASPARAGUS | 11 | | EGGPLANT | 12 | |

How to use groupby in pandas to calculate a percentage / proportion total based on a criteria in another column

阅读更多关于 How to use groupby in pandas to calculate a percentage / proportion total based on a criteria in another column

问题 I'm trying to work out how to use the groupby function in pandas to work out the proportions of values per year with a given Yes/No criteria. For example, I have a dataframe called names : Name Number Year Sex Criteria 0 name1 789 1998 Male N 1 name1 688 1999 Male N 2 name1 639 2000 Male N 3 name2 551 1998 Male Y 4 name2 499 1999 Male Y I can use namesgrouped = names.groupby(["Sex", "Year", "Criteria"]).sum() to get: Number Sex Year Criteria Male 1998 N 14507 Y 2308 1999 N 14119 Y 2331 and so

Getting Empty Results For 'COUNT'/'GROUP BY' MySQL Query

阅读更多关于 Getting Empty Results For 'COUNT'/'GROUP BY' MySQL Query

问题 I am getting similar problem as the issue posted here: How can I get a non empty result set when using select, group by and count? However, the solution given is slower, mentioned by the answerer. I was just wondering if there any alternative solution without compromising performance? Also, I don't understand why a query like: SELECT `a`, `b`, COUNT(*) as `c` FROM `mytable` WHERE `status` = 1 GROUP BY `a`,`b` will return empty result where only without the 'GROUP BY' part it shows expected