问题
I am trying to visualize data of this form:
timestamp senderId
0 735217 106758968942084595234
1 735217 114647222927547413607
2 735217 106758968942084595234
3 735217 106758968942084595234
4 735217 114647222927547413607
5 etc...
geom_density works if I don't separate the senderIds:
df = pd.read_pickle('data.pkl')
df.columns = ['timestamp', 'senderId']
plot = ggplot(aes(x='timestamp'), data=df) + geom_density()
print plot
The result looks as expected:
However if I want to show the senderIds separately, as is done in the doc, it fails:
> plot = ggplot(aes(x='timestamp', color='senderId'), data=df) + geom_density()
ValueError: `dataset` input should have multiple elements.
Trying out with a larger dataset (~40K events):
> plot = ggplot(aes(x='timestamp', color='senderId'), data=df) + geom_density()
numpy.linalg.linalg.LinAlgError: singular matrix
Any idea? There are some answers on SO for those errors but none seems relevant.
This is the kind of graph I want (from ggplot's doc):
回答1:
With the smaller dataset:
> plot = ggplot(aes(x='timestamp', color='senderId'), data=df) + geom_density()
ValueError: `dataset` input should have multiple elements.
This was because some senderIds had only one row.
With the bigger dataset:
> plot = ggplot(aes(x='timestamp', color='senderId'), data=df) + geom_density()
numpy.linalg.linalg.LinAlgError: singular matrix
This was because for some senderIds I had multiple rows at the exact same timestamp. This is not supported by ggplot. I could solve it by using finer timestamps.
来源:https://stackoverflow.com/questions/40101519/plotting-event-density-in-python-with-ggplot-and-pandas