plotnine doesn't add legend

牧云@^-^@ 提交于 2020-06-01 07:43:25

问题


I'm using plotnine to plot two graphs in the same plot. one graph uses the 'b' values from the dataframe you'll see below, and another one uses the values from 'c'.

All I need is to show a simple legend legend where I see 'c' and 'b' with their according color.

def plot_log_detected():
    df = DataFrame({'x': [1, 2, 3, 4, 5],
                    'b': >>>SOME VALUES DOESNT MATTER<<<,
                    'c': >>>SOME VALUES DOESNT MATTER<<<
                   })
    return ggplot(aes(x='x', y='b'), data=df) + geom_point(size=1) +\
           geom_line(aes(y='b'), color='black') + \
           geom_line(aes(y='c'), color='blue') +  \
           ggtitle("TITLE") + \
           labs(y="Y AXIS", x="X AXIS")

回答1:


This won't show a legend if you use ggplot2 in R either: the legend for color will only be represented when you specify color= within an aesthetic for a geom. The "fix" is the same in both python or ggplot for r. You need to organize your data so that you follow tidy data principles. In this case, df$b and df$c columns each contain two pieces of information: (1) value of "y" and (2) type of "y". You should reorganize the data accordingly so that your column names become: x, type_of_y, and value_of_y.

I'll explain by filling in a dataset like you presented, then indicate how we can change it to a tidy format, then how you can (properly) apply the code to represent a plot like I believe you want to to.

The Basics

Here's a dataset and a plot like your plot (again, it's in r... So I hope you can translate into python):

df <- data.frame(
    x=c(1:5), b=c(10, 12, 14, 9, 8), c=c(9, 11, 11, 12, 14))

ggplot(df, aes(x=x)) +
    geom_line(aes(y=b), color='red') +
    geom_line(aes(y=c), color='blue')

No legend, but the colors are there and we plot what you would expect. The problem here is that ggplot draws a legend when you specify color in the aes() call. To see this clearly, let's just do the same plot, but move the color=... inside aes():

ggplot(df, aes(x=x)) +
    geom_line(aes(y=b, color='red')) +
    geom_line(aes(y=c, color='blue'))

Ok that's... wait. What? It has a legend now (because we put color inside aes()), but the colors are actually reversed in order and... you'll notice the colors are not red and blue, but the default "reddish" and "teal" colors of ggplot2. Actually, what happened is that we only specified that in the first geom_line call, we plotted the correct dataset, but we only "titled" the data as "red". Likewise, we "titled" the other dataset "blue". ggplot decided what colors to use based on the default palette.

Getting Your Legend Without Tidy Data

If you don't want to mess with your data, there is actually a way to do this and probably get an output you might be satisfied with. We just have to indicate in color= the name you want to call that series.

ggplot(df, aes(x=x)) +
    geom_line(aes(y=b, color='b')) +
    geom_line(aes(y=c, color='c'))

What about just adding another color='blue' to get a "blue" color outside the aes() as well as inside? Well... that doesn't work. If you do this, for example, the result is identical to the original plot shown (with no legend, but correct color values), since the aes() is effectively overwritten in each geom_line call:

# this doesn't work to keep legend and desired color, the second
# color outside aes() overwrites the one inside aes()
ggplot(df, aes(x=x)) +
    geom_line(aes(y=b, color='b'), color='red') +
    geom_line(aes(y=c, color='c'), color='blue')

The Tidy Data Way (The "correct" way)

While the above method works, it goes against the general principles of Tidy Data and how to organize you data so that it's easy to analyze... in ANY WAY you want to. Trust me: it's definitely the best practice moving forward for working with any dataset for versatility of analysis, and almost always worth the effort to organize your data in that way.

ggplot wants you to specify aes() parameters as columns in your dataset. That means we should make each column serve a specific purpose in your dataset as such:

  • x: This is the same x in the original dataset. It represents only the x-axis value

  • type_of_y: this column contains a value of either 'b' or 'c', indicating to which data series the values should be from.

  • value_of_y: this column contains the value you would plot on y.

Using dplyr, we can reorganize the data in this way pretty simply:

df <- df %>% gather('type_of_y', 'value_of_y', -x)

Giving you:

   x type_of_y value_of_y
1  1         b         10
2  2         b         12
3  3         b         14
4  4         b          9
5  5         b          8
6  1         c          9
7  2         c         11
8  3         c         11
9  4         c         12
10 5         c         14

Then you plot accordingly, using only one geom_line call and apply the color aesthetic to type_of_y. Something like this:

ggplot(df, aes(x=x, y=value_of_y)) +
    geom_line(aes(color=type_of_y))

In this way, you only have to specify one geom_line call. Might not seem too different here, but what if you had multiple columns in your original dataset? Take the case, for example, of having "x", then y values for "a", "b", "c"... "z"! You would have to specify all those lines in separate calls to geom_line! In the case above, no matter how many different y value columns you had... you only have the same two lines of code and only one call to geom_line. Make sense? For more information, I would suggest the link from above. Also, this article is a great read.

You can then assign specific colors by adding scale_color_manual and specifying the colors that way (there's a few other ways too) - but if you need assistance there, I would ask in a separate question. Also... not sure how the code differs for python. Similarly, you can change title of legend via labs(color="your new legend title")... among other theme changes.

I know it is not quite the same code in python, but that should be enough for you to figure our how to do it similarly there.




回答2:


You can melt your data frame to combine columns 'b' and 'c' into one column and create an aesthetic column 'color' for coloring and legend. Here is the code and output. Note that I used original dataframe for point plot (since you only plot column 'b' in that) and used the melted dataframe for line plot:

def plot_log_detected():
    df = DataFrame({'x': [1, 2, 3, 4, 5],
                    'b': [1, 2, 3, 4, 5],
                    'c': [1, 3, 2, 5, 4]
                   })

    df_melt = df.melt(id_vars=['x'], value_vars=['b','c'], var_name='color', value_name='b_and_c')

    return ggplot(aes(x='x', y='b'), data=df) + geom_point(size=1) +\
           geom_line(aes(y='b_and_c', color='color'), data=df_melt) + \
           ggtitle("TITLE") + \
           labs(y="Y AXIS", x="X AXIS")

Your original example dataframe looks like this:

   x  b  c
0  1  1  1
1  2  2  3
2  3  3  2
3  4  4  5
4  5  5  4

And your melted dataframe is:

   x color  b_and_c
0  1     b        1
1  2     b        2
2  3     b        3
3  4     b        4
4  5     b        5
5  1     c        1
6  2     c        3
7  3     c        2
8  4     c        5
9  5     c        4

And finally this is output image:



来源:https://stackoverflow.com/questions/61175495/plotnine-doesnt-add-legend

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!