问题
I have a data frame which looks like this
> fitchRatings
Country Month Year FitchLongTerm LongTermTransformed
1 Abu Dhabi 7 2007 AA 22
2 Angola 5 2012 BB- 12
3 Angola 5 2011 BB- 12
4 Angola 5 2010 B+ 11
5 Argentina 7 2010 B 10
6 Argentina 12 2008 RD 3
7 Argentina 8 2006 RD 3
8 Argentina 12 2005 RD 3
9 Argentina 6 2005 DDD 2
10 Argentina 1 2005 D 0
As you can see, for some Countries, there are multiple observations for a single year. I want to subset the DF so that I keep only one observation for each country-year and the observation I want to keep is the one that has the smallest value for "LongTermTransformed".
In this data set Country and LongTermTransformed are factors and Year is an integer.
Thanks in advance.
回答1:
There are many ways to subset the rows based on the minimum value in a column with grouping columns. One option is to get the index of the 'min' value with which.min
after converting the 'LongTermTransformed' to 'numeric'. We can use slice
to subset the rows identified by the index.
library(dplyr)
fitchRatings %>%
group_by(Country, Year) %>%
slice(which.min(as.numeric(as.character(LongTermTransformed))))
Or we can use a similar option with data.table
. The difference is we convert the 'data.frame' to 'data.table' (setDT
) and for subsetting .SD
was used.
library(data.table)#v1.9.5+
setDT(fitchRatings)[,
.SD[which.min(as.numeric(levels(LongTermTransformed))[LongTermTransformed])],
by = .(Country, Year)]
回答2:
Another data.table
options is just to order by LongTermTransformed
and take the first value per Country
and Year
using unique
library(data.table)
unique(setDT(fitchRatings)[order(as.numeric(as.character(LongTermTransformed)))], by = c("Country", "Year"))
Or using dplyr
library(dplyr)
fitchRatings %>%
arrange(as.numeric(as.character(LongTermTransformed))) %>%
distinct(Country, Year)
来源:https://stackoverflow.com/questions/31462107/subsetting-dataframe-in-r-by-duplicate-values-for-year-by-lowest-value-for-ratin