top-n

Oracle: I need to select n rows from every k rows of a table

倾然丶 夕夏残阳落幕 提交于 2020-01-17 06:10:14
问题 For example: My table has 10000 rows. First I will divide it in 5 sets of 2000(k) rows. Then from each set of 2000 rows I will select only top 100(n) rows. With this approach I am trying to scan some rows of table with a specific pattern. 回答1: Assuming you are ordering them 1 - 10000 using some logic and want to output only rows 1-100,2001-2100,4001-4100,etc then you can use the ROWNUM pseudocolumn: SELECT * FROM ( SELECT t.*, ROWNUM AS rn -- Secondly, assign a row number to the ordered rows

Oracle Analytic function for min value in grouping

独自空忆成欢 提交于 2020-01-11 04:55:08
问题 I'm new to working with analytic functions. DEPT EMP SALARY ---- ----- ------ 10 MARY 100000 10 JOHN 200000 10 SCOTT 300000 20 BOB 100000 20 BETTY 200000 30 ALAN 100000 30 TOM 200000 30 JEFF 300000 I want the department and employee with minimum salary. Results should look like: DEPT EMP SALARY ---- ----- ------ 10 MARY 100000 20 BOB 100000 30 ALAN 100000 EDIT: Here's the SQL I have (but of course, it doesn't work as it wants staff in the group by clause as well): SELECT dept, emp, MIN(salary

Find names of top-n highest-value columns in each pandas dataframe row

冷暖自知 提交于 2020-01-09 03:28:11
问题 I have the following dataframe: id p1 p2 p3 p4 1 0 9 1 4 2 0 2 3 4 3 1 3 10 7 4 1 5 3 1 5 2 3 7 10 I need to reshape the data frame in a way that for each id it will have the top 3 columns with the highest values. The result would be like this: id top1 top2 top3 1 p2 p4 p3 2 p4 p3 p2 3 p3 p4 p2 4 p2 p3 p4/p1 5 p4 p3 p2 It shows the top 3 best sellers for every user_id . I have already done it using the dplyr package in R, but I am looking for the pandas equivalent. 回答1: You could use np

Tidyverse: filtering n largest groups in grouped dataframe

核能气质少年 提交于 2020-01-02 01:10:20
问题 I want to filter the n largest groups based on count, and then do some calculations on the filtered dataframe Here is some data Brand <- c("A","B","C","A","A","B","A","A","B","C") Category <- c(1,2,1,1,2,1,2,1,2,1) Clicks <- c(10,11,12,13,14,15,14,13,12,11) df <- data.frame(Brand,Category,Clicks) |Brand | Category| Clicks| |:-----|--------:|------:| |A | 1| 10| |B | 2| 11| |C | 1| 12| |A | 1| 13| |A | 2| 14| |B | 1| 15| |A | 2| 14| |A | 1| 13| |B | 2| 12| |C | 1| 11| This is my expected

Oracle SQL Finding the 5 lowest salaries

南楼画角 提交于 2019-12-31 05:10:47
问题 I am trying to answer the following question. Show ID_Number and name for the five lowest paid employees. This is the table with employees: CREATE TABLE Employees (ID_No CHAR(4) NOT NULL, Name VARCHAR(50) NOT NULL, Hire_Date DATE NOT NULL, Position VARCHAR(20) CHECK(Position IN('CHAIRMAN','MANAGER','ANALYST','DESIGNER','PROGRAMMER','SALES REP','ADMIN','ACCOUNTANT')), Salary NUMERIC(8,2) NOT NULL, Mgr_ID_No CHAR(4) NULL, Dept_No SMALLINT NULL); I will add that I've been trying a few methods

Finding top N columns for each row in data frame

故事扮演 提交于 2019-12-30 06:15:07
问题 given a data frame with one descriptive column and X numeric columns, for each row I'd like to identify the top N columns with the higher values and save it as rows on a new dataframe. For example, consider the following data frame: df = pd.DataFrame() df['index'] = ['A', 'B', 'C', 'D','E', 'F'] df['option1'] = [1,5,3,7,9,3] df['option2'] = [8,4,5,6,9,2] df['option3'] = [9,9,1,3,9,5] df['option4'] = [3,8,3,5,7,0] df['option5'] = [2,3,4,9,4,2] I'd like to output (lets say N is 3, so I want the

Get the minimum employees with a given job

自古美人都是妖i 提交于 2019-12-27 04:00:23
问题 I have this table: Name Null? Type -------------------------- -------- ------------ EMPLOYEENO NOT NULL NUMBER(4) ENAME VARCHAR2(15) JOB VARCHAR2(15) MGR NUMBER(4) HIREDATE DATE SAL NUMBER COMM NUMBER DEPTNO NUMBER(2). I want to get the department with minimum employees who have a given job (for example all the employees with 'Analyst' job). Can you please help me with the query? 回答1: Here the key is to get the count of Employee doing particular job in each department. In below query, this is

How can I select the “maximum” row from a table?

亡梦爱人 提交于 2019-12-24 09:49:52
问题 How can I select the maximum row from a table? What does maximum mean -- well my table has two timestamp columns, TIME1 and TIME2. The maximum column is the one with the latest value for TIME1. If that is not a unique row, then the maximum is the one within those rows with the latest value for TIME2. This is on Oracle if that matters. 回答1: What you need is a "Top-N" query: select * from ( select * from table order by time1 desc, time2 desc ) where rownum < 2; if you properly index on time1,

SQL Server: Why do these queries return different result sets?

China☆狼群 提交于 2019-12-24 08:26:29
问题 Query 1 = select top 5 i.item_id from ITEMS i Query 2 = select top 5 i.item_id, i.category_id from ITEMS i Even if I remove the top 5 clause they still return different rows. if I run "select top 5 i.* from ITEMS i" this returns a completely different result set !! 回答1: Because the results of a "TOP N" qualified SELECT are indeterminate if you do not have an ORDER BY clause. 回答2: Without an ORDER BY clause, you cannot predict what order you will get results. There is probably an interesting

Pandas report top-n in group and pivot

自作多情 提交于 2019-12-22 08:51:40
问题 I am trying to summarise a dataframe by grouping along a single dimension d1 and reporting summary statistics for each element of d1. In particular I am interested in the top n (index and values) for a number of metrics. what I would like to produce is a row for each element of d1. Say I have two dimensions d1, d2 and 4 metrics m1,m2,m3, m4 1) what is the suggested way of grouping by d1, and finding the top n d2 and metric value, for each of metrics m1 - m4. in Wes's book Python for Data