Using aes(data$variable)
inside is never good, never recommended, and should never be used. Sometimes it still works, but aes(variable)
always works, so you should always use aes(variable)
.
More explanation:
ggplot
uses nonstandard evaluation. A standard evaluating R function can only see objects in the global environment. If I have data named mydata
with a column name col1
, and I do mean(col1)
, I get an error:
mydata = data.frame(col1 = 1:3)
mean(col1)
# Error in mean(col1) : object 'col1' not found
This error happens because col1
isn't in the global environment. It's just a column name of the mydata
data frame.
The aes
function does extra work behind the scenes, and knows to look at the columns of the layer's data
, in addition to checking the global environment.
ggplot(mydata, aes(x = col1)) + geom_bar()
# no error
You don't have to use just a column inside aes
though. To give flexibility, you can do a function of a column, or even some other vector that you happen to define on the spot (if it has the right length):
# these work fine too
ggplot(mydata, aes(x = log(col1))) + geom_bar()
ggplot(mydata, aes(x = c(1, 8, 11)) + geom_bar()
So what's the difference between col1
and mydata$col1
? Well, col1
is a name of a column, and mydata$col1
is the actual values. ggplot
will look for columns in your data named col1
, and use that. mydata$col1
is just a vector, it's the full column. The difference matters because ggplot
often does data manipulation. Whenever there are facets or aggregate functions, ggplot is splitting your data up into pieces and doing stuff. To do this effectively, it needs to know identify the data and column names. When you give it mydata$col1
, you're not giving it a column name, you're just giving it a vector of values - whatever happens to be in that column, and things don't work.
So, just use unquoted column names in aes()
without data$
and everything will work as expected.