heatmap based on ratios in Python's seaborn

梦想的初衷 提交于 2019-12-24 09:16:20

问题


I have data in Cartesian coordinates. To each Cartesian coordinate there is also binary variable. I wan to make a heatmap, where in each polygon (hexagon/rectangle,etc.) the color strength is the ratio of number of occurrences where the boolean is True out of the total occurrences in that polygon.

The data can for example look like this:

df = pd.DataFrame([[1,2,False],[-1,5,True], [51,52,False]])

I know that seaborn can generate heatmaps via seaborn.heatmap, but the color strength is based by default on the total occurrences in each polygon, not the above ratio. Is there perhaps another plotting tool that would be more suitable?


回答1:


You could also use the pandas groupby functionality to compute the ratios and then pass the result to seaborn.heatmap. With the example data borrowed from @ImportanceOfBeingErnest it would look like this:

import numpy as np
import pandas as pd
import seaborn as sns

np.random.seed(0)
x = np.random.poisson(5, size=200)
y = np.random.poisson(7, size=200)
z = np.random.choice([True, False], size=200, p=[0.3, 0.7])

df = pd.DataFrame({"x" : x, "y" : y, "z":z})
res = df.groupby(['y','x'])['z'].mean().unstack()

ax = sns.heatmap(res)
ax.axis('equal')
ax.invert_yaxis()

the resulting plot

If your x and y values aren't integers you can cut them into the desired number of categories for grouping:

bins = 10
res = df.groupby([pd.cut(df.y, bins),pd.cut(df.x,bins)])['z'].mean().unstack()



回答2:


An option would be to calculate two histograms, one for the complete dataframe, and one for the dataframe filtered for the True values. Then dividing the latter by the former gives the ratio, you're after.

from __future__ import division
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

x = np.random.poisson(5, size=200)
y = np.random.poisson(7, size=200)
z = np.random.choice([True, False], size=200, p=[0.3, 0.7])

df = pd.DataFrame({"x" : x, "y" : y, "z":z})
dftrue = df[df["z"] == True]

bins = np.arange(0,22)
hist, xbins, ybins = np.histogram2d(df.x, df.y, bins=bins)
histtrue, _ ,__ = np.histogram2d(dftrue.x, dftrue.y, bins=bins)

plt.imshow(histtrue/hist, cmap=plt.cm.Reds)
plt.colorbar()
plt.show()



来源:https://stackoverflow.com/questions/42945907/heatmap-based-on-ratios-in-pythons-seaborn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!