Subsetting DataFrame in R by duplicate values for Year by lowest value for Rating

我是研究僧i 提交于 2019-12-12 19:23:00

问题


I have a data frame which looks like this

> fitchRatings
               Country Month Year FitchLongTerm LongTermTransformed
1            Abu Dhabi     7 2007            AA                  22
2               Angola     5 2012           BB-                  12
3               Angola     5 2011           BB-                  12
4               Angola     5 2010            B+                  11
5            Argentina     7 2010             B                  10
6            Argentina    12 2008            RD                   3
7            Argentina     8 2006            RD                   3
8            Argentina    12 2005            RD                   3
9            Argentina     6 2005           DDD                   2
10           Argentina     1 2005             D                   0

As you can see, for some Countries, there are multiple observations for a single year. I want to subset the DF so that I keep only one observation for each country-year and the observation I want to keep is the one that has the smallest value for "LongTermTransformed".

In this data set Country and LongTermTransformed are factors and Year is an integer.

Thanks in advance.


回答1:


There are many ways to subset the rows based on the minimum value in a column with grouping columns. One option is to get the index of the 'min' value with which.min after converting the 'LongTermTransformed' to 'numeric'. We can use slice to subset the rows identified by the index.

library(dplyr)
fitchRatings %>%
          group_by(Country, Year) %>%
          slice(which.min(as.numeric(as.character(LongTermTransformed))))

Or we can use a similar option with data.table. The difference is we convert the 'data.frame' to 'data.table' (setDT) and for subsetting .SD was used.

library(data.table)#v1.9.5+
setDT(fitchRatings)[, 
 .SD[which.min(as.numeric(levels(LongTermTransformed))[LongTermTransformed])],
              by = .(Country, Year)]



回答2:


Another data.table options is just to order by LongTermTransformed and take the first value per Country and Year using unique

library(data.table)
unique(setDT(fitchRatings)[order(as.numeric(as.character(LongTermTransformed)))], by = c("Country", "Year"))

Or using dplyr

library(dplyr)
fitchRatings %>%
  arrange(as.numeric(as.character(LongTermTransformed))) %>%
  distinct(Country, Year) 


来源:https://stackoverflow.com/questions/31462107/subsetting-dataframe-in-r-by-duplicate-values-for-year-by-lowest-value-for-ratin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!