问题
I've finally decided to put the sort.data.frame method that's floating around the internet into an R package. It just gets requested too much to be left to an ad hoc method of distribution.
However, it's written with arguments that make it incompatible with the generic sort function:
sort(x,decreasing,...)
sort.data.frame(form,dat)
If I change sort.data.frame
to take decreasing as an argument as in sort.data.frame(form,decreasing,dat)
and discard decreasing, then it loses its simplicity because you'll always have to specify dat=
and can't really use positional arguments. If I add it to the end as in sort.data.frame(form,dat,decreasing)
, then the order doesn't match with the generic function. If I hope that decreasing gets caught up in the dots `sort.data.frame(form,dat,...), then when using position-based matching I believe the generic function will assign the second position to decreasing and it will get discarded. What's the best way to harmonize these two functions?
The full function is:
# Sort a data frame
sort.data.frame <- function(form,dat){
# Author: Kevin Wright
# http://tolstoy.newcastle.edu.au/R/help/04/09/4300.html
# Some ideas from Andy Liaw
# http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html
# Use + for ascending, - for decending.
# Sorting is left to right in the formula
# Useage is either of the following:
# sort.data.frame(~Block-Variety,Oats)
# sort.data.frame(Oats,~-Variety+Block)
# If dat is the formula, then switch form and dat
if(inherits(dat,"formula")){
f=dat
dat=form
form=f
}
if(form[[1]] != "~") {
stop("Formula must be one-sided.")
}
# Make the formula into character and remove spaces
formc <- as.character(form[2])
formc <- gsub(" ","",formc)
# If the first character is not + or -, add +
if(!is.element(substring(formc,1,1),c("+","-"))) {
formc <- paste("+",formc,sep="")
}
# Extract the variables from the formula
vars <- unlist(strsplit(formc, "[\\+\\-]"))
vars <- vars[vars!=""] # Remove spurious "" terms
# Build a list of arguments to pass to "order" function
calllist <- list()
pos=1 # Position of + or -
for(i in 1:length(vars)){
varsign <- substring(formc,pos,pos)
pos <- pos+1+nchar(vars[i])
if(is.factor(dat[,vars[i]])){
if(varsign=="-")
calllist[[i]] <- -rank(dat[,vars[i]])
else
calllist[[i]] <- rank(dat[,vars[i]])
}
else {
if(varsign=="-")
calllist[[i]] <- -dat[,vars[i]]
else
calllist[[i]] <- dat[,vars[i]]
}
}
dat[do.call("order",calllist),]
}
Example:
library(datasets)
sort.data.frame(~len+dose,ToothGrowth)
回答1:
There are a few problems there. sort.data.frame
needs to have the same arguments as the generic, so at a minimum it needs to be
sort.data.frame(x, decreasing = FALSE, ...) {
....
}
To have dispatch work, the first argument needs to be the object dispatched on. So I would start with:
sort.data.frame(x, decreasing = FALSE, formula = ~ ., ...) {
....
}
where x
is your dat
, formula
is your form
, and we provide a default for formula to include everything. (I haven't studied your code in detail to see exactly what form
represents.)
Of course, you don't need to specify decreasing
in the call, so:
sort(ToothGrowth, formula = ~ len + dose)
would be how to call the function using the above specifications.
Otherwise, if you don't want sort.data.frame
to be an S3 generic, call it something else and then you are free to have whatever arguments you want.
回答2:
Use the arrange
function in plyr
. It allows you to individually pick which variables should be in ascending and descending order:
arrange(ToothGrowth, len, dose)
arrange(ToothGrowth, desc(len), dose)
arrange(ToothGrowth, len, desc(dose))
arrange(ToothGrowth, desc(len), desc(dose))
It also has an elegant implementation:
arrange <- function (df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
And desc
is just an ordinary function:
desc <- function (x) -xtfrm(x)
Reading the help for xtfrm
is highly recommended if you're writing this sort of function.
回答3:
Can you just mask the base definition of sort
, i.e. something like this?
sort <- function(x,...) {
if (inherits(x,"data.frame")) {
sort.data.frame(x,...)
} else {
L <- list(...)
if (!is.null(names(L))) {
if ("decreasing" %in% names(L)) {
decreasing <- L[["decreasing"]]
L <- L[names(L)!="decreasing"]
}
} else {
if (any(names(L)=="")) {
dpos <- which.min(names(L)=="")
decreasing <- L[[dpos]]
L <- L[-dpos]
} else decreasing <- FALSE
}
arglist <- c(list(x=x,decreasing=decreasing),L)
do.call(base::sort,arglist)
}
}
回答4:
I agree with @Gavin that x
must come first. I'd put the decreasing
parameter after the formula
though - since it probably isn't used that much, and hardly ever as a positional argument.
The formula
argument would be used much more and therefore should be the second argument. I also strongly agree with @Gavin that it should be called formula
, and not form
.
sort.data.frame(x, formula = ~ ., decreasing = FALSE, ...) {
...
}
You might want to extend the decreasing
argument to allow a logical vector where each TRUE/FALSE value corresponds to one column in the formula:
d <- data.frame(A=1:10, B=10:1)
sort(d, ~ A+B, decreasing=c(A=TRUE, B=FALSE)) # sort by decreasing A, increasing B
来源:https://stackoverflow.com/questions/6836963/best-way-to-create-generic-method-consistency-for-sort-data-frame