问题
Drawing on the discussion on conditional dplyr evaluation I would like conditionally execute a step in pipeline depending on whether the reference column exists in the passed data frame.
Example
The results generated by 1)
and 2)
should be identical.
Existing column
# 1)
mtcars %>%
filter(am == 1) %>%
filter(cyl == 4)
# 2)
mtcars %>%
filter(am == 1) %>%
{
if("cyl" %in% names(.)) filter(cyl == 4) else .
}
Unavailable column
# 1)
mtcars %>%
filter(am == 1)
# 2)
mtcars %>%
filter(am == 1) %>%
{
if("absent_column" %in% names(.)) filter(absent_column == 4) else .
}
Problem
For the available column the passed object does not correspond to the initial data frame. The original code returns the error message:
Error in
filter(cyl == 4)
: object'cyl'
not found
I have tried alternative syntax (with no luck):
>> mtcars %>%
... filter(am == 1) %>%
... {
... if("cyl" %in% names(.)) filter(.$cyl == 4) else .
... }
Show Traceback
Rerun with Debug
Error in UseMethod("filter_") :
no applicable method for 'filter_' applied to an object of class "logical"
Follow-up
I wanted to expand this question that would account for the evaluation on the right-hand side of the ==
in filter
call. For instance the syntax below attempts to filter on the first available value.
mtcars %>%
filter({
if ("does_not_ex" %in% names(.))
does_not_ex
else
NULL
} == {
if ("does_not_ex" %in% names(.))
unique(.[['does_not_ex']])
else
NULL
})
Expectedly, the call evaluates to an error message:
Error in
filter_impl(.data, quo)
: Result must have length 32, not 0
When applied to existing column:
mtcars %>%
filter({
if ("mpg" %in% names(.))
mpg
else
NULL
} == {
if ("mpg" %in% names(.))
unique(.[['mpg']])
else
NULL
})
It works with a warning message:
mpg cyl disp hp drat wt qsec vs am gear carb
1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
Warning message: In
{
: longer object length is not a multiple of shorter object length
Follow-up question
Is there a neat way of expending the existing syntax in order to get conditional evaluation on the right-hand side of the filter
call, ideally staying within dplyr workflow?
回答1:
Because of the way the scopes here work, you cannot access the dataframe from within your if
statement. Fortunately, you don't need to.
Try:
mtcars %>%
filter(am == 1) %>%
filter({if("cyl" %in% names(.)) cyl else NULL} == 4)
Here you can use the '.
' object within the conditional so you can check if the column exists and, if it exists, you can return the column to the filter
function.
EDIT: as per docendo discimus' comment on the question, you can access the dataframe but not implicitly - i.e. you have to specifically reference it with .
回答2:
I know I'm late to the party, but here's an answer somewhat more in line with what you were originally thinking:
mtcars %>%
filter(am == 1) %>%
{
if("cyl" %in% names(.)) filter(., cyl == 4) else .
}
Basically, you were missing the .
in filter
. Note this is because the pipeline doesn't add .
to filter(expr)
since it is in an expression surrounded by {}
.
回答3:
Edit: Unfortunately, this was too good to be true
I might be a bit late to the party. But is
mtcars %>%
filter(am == 1) %>%
try(filter(absent_column== 4))
a solution?
回答4:
This code does the trick and is pretty flexible. The ^ and $ are regex used to perform an exact match.
mtcars %>%
set_names(names(.) %>%
str_replace("am","1") %>%
str_replace("^cyl$","2") %>%
str_replace("Doesn't Exist","3")
)
来源:https://stackoverflow.com/questions/45146688/execute-dplyr-operation-only-if-column-exists