How to get top n companies from a data frame in decreasing order

删除回忆录丶 提交于 2019-11-26 16:36:54

问题


I am trying to get the top 'n' companies from a data frame.Here is my code below.

data("Forbes2000", package = "HSAUR")
sort(Forbes2000$profits,decreasing=TRUE)

Now I would like to get the top 50 observations from this sorted vector.


回答1:


head and tail are really useful functions!

head(sort(Forbes2000$profits,decreasing=TRUE), n = 50)

If you want the first 50 rows of the data.frame, then you can use the arrange function from plyr to sort the data.frame and then use head

library(plyr)

head(arrange(Forbes2000,desc(profits)), n = 50)

Notice that I wrapped profits in a call to desc which means it will sort in decreasing order.

To work without plyr

head(Forbes2000[order(Forbes2000$profits, decreasing= T),], n = 50)



回答2:


Use order to sort the data.frame, then use head to get only the first 50 rows.

data("Forbes2000", package = "HSAUR")
head(Forbes2000[order(Forbes2000$profits, decreasing=TRUE), ], 50)



回答3:


You can use rank from dplyr.

    library(dplyr)
    top_fifty <- Forbes2000 %>%
         filter(rank(desc(profits))<=50)

This sorts your data in descending order and only keeps values where the rank is less than or equal to 50 (i.e. the top 50).
Dplyr is very useful. The commands and chaining syntax are very easy to understand. 10/10 would recommend.




回答4:


Mnel is right that in general, You want to use head() and tail() functions along with the a sorting function. I should mention though for medium data sets Vince's method works faster. If you didn't use head() or tail(), then you could used the basic subsection call operator []....

 library(plyr)
 x = arrange(Forbes2000,desc(profits))
 x = x[1:50,]
 #Or using Order
 x = Forbes2000[order(Forbes2000$profits, decreasing= T),]
 x = x[1:50,]

However, I really do recommend the head(), tail(), or filter() functions because the regular [] operator assumes your data is structured in easily drawn array or matrix format. (Hopefully, this answers Teja question)

Now which pacakage you choose is largely subjective. However reading people's comments, I will say that the choice to use plyr's arrange(), {bases}'s order() with {utils} head() and tails, or plyr() largely depends on the memory size and row size of your dataset. I could go into more detail about how Plyr and sometimes Dplyr have problems with large complex datasets, but I don't want to get off topic.

P.S. This is one of my first times answering so feedback is appreciated.



来源:https://stackoverflow.com/questions/12187891/how-to-get-top-n-companies-from-a-data-frame-in-decreasing-order

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!