top-n

Oracle Analytic function for min value in grouping

孤人 提交于 2019-12-01 04:22:45
I'm new to working with analytic functions. DEPT EMP SALARY ---- ----- ------ 10 MARY 100000 10 JOHN 200000 10 SCOTT 300000 20 BOB 100000 20 BETTY 200000 30 ALAN 100000 30 TOM 200000 30 JEFF 300000 I want the department and employee with minimum salary. Results should look like: DEPT EMP SALARY ---- ----- ------ 10 MARY 100000 20 BOB 100000 30 ALAN 100000 EDIT: Here's the SQL I have (but of course, it doesn't work as it wants staff in the group by clause as well): SELECT dept, emp, MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY salary) FROM mytable GROUP BY dept David Aldridge I think that the

How to see top n entries of term-document matrix after tfidf in scikit-learn

吃可爱长大的小学妹 提交于 2019-11-29 19:33:30
I am new to scikit-learn, and I was using TfidfVectorizer to find the tfidf values of terms in a set of documents. I used the following code to obtain the same. vectorizer = TfidfVectorizer(stop_words=u'english',ngram_range=(1,5),lowercase=True) X = vectorizer.fit_transform(lectures) Now If I print X, I am able to see all the entries in matrix, but how can I find top n entries based on tfidf score. In addition to that is there any method that will help me to find top n entries based on tfidf score per ngram i.e. top entries among unigram,bigram,trigram and so on? YS-L Since version 0.15, the

How to see top n entries of term-document matrix after tfidf in scikit-learn

雨燕双飞 提交于 2019-11-28 14:54:21
问题 I am new to scikit-learn, and I was using TfidfVectorizer to find the tfidf values of terms in a set of documents. I used the following code to obtain the same. vectorizer = TfidfVectorizer(stop_words=u'english',ngram_range=(1,5),lowercase=True) X = vectorizer.fit_transform(lectures) Now If I print X, I am able to see all the entries in matrix, but how can I find top n entries based on tfidf score. In addition to that is there any method that will help me to find top n entries based on tfidf

SQL Query to Select the 'Next' record (similar to First or Top N)

夙愿已清 提交于 2019-11-28 12:39:33
I need to do a query to return the next (or prev) record if a certain record is not present. For instance consider the following table: ID (primary key) value 1 John 3 Bob 9 Mike 10 Tom. I'd like to query a record that has id 7 or greater if 7 is not present. My questions are, Are these type of queries possible with SQL? What are such queries called in the DB world? Thanks! Adrian Carneiro Yes, it's possible, but implementation will depend on your RDBMS. Here's what it looks like in MySQL, PostgreSQL and SQLite: select ID, value from YourTable where id >= 7 order by id limit 1 In MS SQL-Server

Selecting top n elements of a group in Oracle

怎甘沉沦 提交于 2019-11-28 11:46:33
I have an Oracle table which has a name,value,time columns.Basically the table is for logging purposes to store what are the changes made to a particular name,what was the previous value and what time the change was made. I need to formulate a query to fetch the top n changes for a particular name,and the output should have all the names in the table. Any help/suggesstions? Edit: Name Value Time Harish Pass 1-Nov-2011 Ravi Fail 2-Nov-2011 Harish Absent 31-Oct-2011 Harish Attended 31-Aug-2011 Harish Present 31-Jul-2011 I need to select details of Harish on 1st Nov,Oct 31st,31st Aug and Ravi.

Get top-n items of every row in a scipy sparse matrix

柔情痞子 提交于 2019-11-28 05:46:41
问题 After reading this similar question, I still can't fully understand how to go about implementing the solution im looking for. I have a sparse matrix, i.e.: import numpy as np from scipy import sparse arr = np.array([[0,5,3,0,2],[6,0,4,9,0],[0,0,0,6,8]]) arr_csc = sparse.csc_matrix(arr) I would like to efficiently get the top n items of each row , without converting the sparse matrix to dense. The end result should look like this (assuming n=2): top_n_arr = np.array([[0,5,3,0,0],[6,0,0,9,0],[0

Find names of top-n highest-value columns in each pandas dataframe row

丶灬走出姿态 提交于 2019-11-27 20:54:07
I have the following dataframe: id p1 p2 p3 p4 1 0 9 1 4 2 0 2 3 4 3 1 3 10 7 4 1 5 3 1 5 2 3 7 10 I need to reshape the data frame in a way that for each id it will have the top 3 columns with the highest values. The result would be like this: id top1 top2 top3 1 p2 p4 p3 2 p4 p3 p2 3 p3 p4 p2 4 p2 p3 p4/p1 5 p4 p3 p2 It shows the top 3 best sellers for every user_id . I have already done it using the dplyr package in R, but I am looking for the pandas equivalent. unutbu You could use np.argsort to find the indices of the n largest items for each row: import numpy as np import pandas as pd df

Top n records per group sql in access

99封情书 提交于 2019-11-27 09:22:45
I am making some software that tracks the scores of a test. There are multiple users, the details of which are stored in a user table. There is then a progress table which tracks a score with the date and the user who's score it is. I can already select the 3 most recent records for a chosen userID SELECT TOP 3 Progress.LoginID, Progress.Score, Progress.[Date Taken] FROM Progress WHERE (((Progress.LoginID)=[Enter LoginID:])) ORDER BY Progress.[Date Taken] DESC; And I can show all the records grouped by LoginID SELECT Progress.LoginID, Progress.Score, Progress.[Date Taken] FROM Progress GROUP

Selecting top n elements of a group in Oracle

纵饮孤独 提交于 2019-11-27 06:29:36
问题 I have an Oracle table which has a name,value,time columns.Basically the table is for logging purposes to store what are the changes made to a particular name,what was the previous value and what time the change was made. I need to formulate a query to fetch the top n changes for a particular name,and the output should have all the names in the table. Any help/suggesstions? Edit: Name Value Time Harish Pass 1-Nov-2011 Ravi Fail 2-Nov-2011 Harish Absent 31-Oct-2011 Harish Attended 31-Aug-2011

Spark sql top n per group

末鹿安然 提交于 2019-11-27 02:01:44
How can I get the top-n (lets say top 10 or top 3) per group in spark-sql ? http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ provides a tutorial for general SQL. However, spark does not implement subqueries in the where clause. tyagi You can use the window function feature that was added in Spark 1.4 Suppose that we have a productRevenue table as shown below. the answer to What are the best-selling and the second best-selling products in every category is as follows SELECT product,category,revenue FROM (SELECT product,category,revenue,dense_rank() OVER