top-n

How to get top n companies from a data frame in decreasing order

删除回忆录丶 提交于 2019-11-26 16:36:54
问题 I am trying to get the top 'n' companies from a data frame.Here is my code below. data("Forbes2000", package = "HSAUR") sort(Forbes2000$profits,decreasing=TRUE) Now I would like to get the top 50 observations from this sorted vector. 回答1: head and tail are really useful functions! head(sort(Forbes2000$profits,decreasing=TRUE), n = 50) If you want the first 50 rows of the data.frame, then you can use the arrange function from plyr to sort the data.frame and then use head library(plyr) head

How to find the employee with the second highest salary?

早过忘川 提交于 2019-11-26 14:48:12
问题 Is there any predefined function or method available to get the second highest salary from an employee table? 回答1: The way to do this is with Oracle's Analytic functions. Your particular scenario is just a variant on the solution I provided in another thread. If you are interested in just selecting the second highest salary then any of DENSE_RANK(), RANK() and ROW_NUMBER() will do the trick: SQL> select * from 2 ( select sal 3 , rank() over (order by sal desc) as rnk 4 from 5 ( select

Spark sql top n per group

时光毁灭记忆、已成空白 提交于 2019-11-26 14:09:26
问题 How can I get the top-n (lets say top 10 or top 3) per group in spark-sql ? http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ provides a tutorial for general SQL. However, spark does not implement subqueries in the where clause. 回答1: You can use the window function feature that was added in Spark 1.4 Suppose that we have a productRevenue table as shown below. the answer to What are the best-selling and the second best-selling products in every category

Oracle SQL - How to Retrieve highest 5 values of a column

我怕爱的太早我们不能终老 提交于 2019-11-26 12:28:59
How do you write a query where only a select number of rows are returned with either the highest or lowest column value. i.e. A report with the 5 highest salaried employees? The best way to do this is with analytic functions, RANK() or DENSE_RANK() ... SQL> select * from ( 2 select empno 3 , sal 4 , rank() over (order by sal desc) as rnk 5 from emp) 6 where rnk <= 5 7 / EMPNO SAL RNK ---------- ---------- ---------- 7839 5000 1 7788 3000 2 7902 3000 2 7566 2975 4 8083 2850 5 7698 2850 5 6 rows selected. SQL> DENSE_RANK() compresses the gaps when there is a tie: SQL> select * from ( 2 select

Oracle SQL - How to Retrieve highest 5 values of a column

不想你离开。 提交于 2019-11-26 02:58:27
问题 How do you write a query where only a select number of rows are returned with either the highest or lowest column value. i.e. A report with the 5 highest salaried employees? 回答1: The best way to do this is with analytic functions, RANK() or DENSE_RANK() ... SQL> select * from ( 2 select empno 3 , sal 4 , rank() over (order by sal desc) as rnk 5 from emp) 6 where rnk <= 5 7 / EMPNO SAL RNK ---------- ---------- ---------- 7839 5000 1 7788 3000 2 7902 3000 2 7566 2975 4 8083 2850 5 7698 2850 5

Oracle SELECT TOP 10 records

时光毁灭记忆、已成空白 提交于 2019-11-26 01:43:05
问题 I have an big problem with an SQL Statement in Oracle. I want to select the TOP 10 Records ordered by STORAGE_DB which aren\'t in a list from an other select statement. This one works fine for all records: SELECT DISTINCT APP_ID, NAME, STORAGE_GB, HISTORY_CREATED, TO_CHAR(HISTORY_DATE, \'DD.MM.YYYY\') AS HISTORY_DATE FROM HISTORY WHERE STORAGE_GB IS NOT NULL AND APP_ID NOT IN (SELECT APP_ID FROM HISTORY WHERE TO_CHAR(HISTORY_DATE, \'DD.MM.YYYY\') = \'06.02.2009\') But when I am adding AND

Pandas get topmost n records within each group

≡放荡痞女 提交于 2019-11-26 00:53:43
问题 Suppose I have pandas DataFrame like this: >>> df = pd.DataFrame({\'id\':[1,1,1,2,2,2,2,3,4],\'value\':[1,2,3,1,2,3,4,1,1]}) >>> df id value 0 1 1 1 1 2 2 1 3 3 2 1 4 2 2 5 2 3 6 2 4 7 3 1 8 4 1 I want to get a new DataFrame with top 2 records for each id, like this: id value 0 1 1 1 1 2 3 2 1 4 2 2 7 3 1 8 4 1 I can do it with numbering records within group after group by: >>> dfN = df.groupby(\'id\').apply(lambda x:x[\'value\'].reset_index()).reset_index() >>> dfN id level_1 index value 0 1

Oracle SQL query: Retrieve latest values per group based on time [duplicate]

╄→гoц情女王★ 提交于 2019-11-26 00:44:22
问题 This question already has answers here : Fetch the row which has the Max value for a column (34 answers) Closed 2 years ago . I have the following table in an Oracle DB id date quantity 1 2010-01-04 11:00 152 2 2010-01-04 11:00 210 1 2010-01-04 10:45 132 2 2010-01-04 10:45 318 4 2010-01-04 10:45 122 1 2010-01-04 10:30 1 3 2010-01-04 10:30 214 2 2010-01-04 10:30 5515 4 2010-01-04 10:30 210 now I\'d like to retrieve the latest value (and its time) per id. Example output: id date quantity 1 2010

Oracle SQL query: Retrieve latest values per group based on time [duplicate]

孤街醉人 提交于 2019-11-25 22:44:16
This question already has an answer here: Fetch the row which has the Max value for a column 34 answers I have the following table in an Oracle DB id date quantity 1 2010-01-04 11:00 152 2 2010-01-04 11:00 210 1 2010-01-04 10:45 132 2 2010-01-04 10:45 318 4 2010-01-04 10:45 122 1 2010-01-04 10:30 1 3 2010-01-04 10:30 214 2 2010-01-04 10:30 5515 4 2010-01-04 10:30 210 now I'd like to retrieve the latest value (and its time) per id. Example output: id date quantity 1 2010-01-04 11:00 152 2 2010-01-04 11:00 210 3 2010-01-04 10:30 214 4 2010-01-04 10:45 122 I just can't figure out how to put that