问题
I have some data I would like to plot consisting of two columns, one being an amount count and the other column being the actually date recorded. When plotting this, since I have over 2000 dates, it makes more sense to not show every single date as a tick on the x-axis, otherwise it won't be readable. However, I am having a hard time making the dates show up on the x-axis with some kind of logic. I have tried using the in-built tick locators for matplotlib but it's not working somehow. Here is a preview of the data:
PatientTraffic = pd.DataFrame({'count' : CleanData.groupby("TimeStamp").size()}).reset_index()
display(PatientTraffic.head(3000))
TimeStamp count
0 2016-03-13 12:20:00 1
1 2016-03-13 13:39:00 1
2 2016-03-13 13:43:00 1
3 2016-03-13 16:00:00 1
4 2016-03-14 13:27:00 1
... ... ...
2088 2020-02-18 16:00:00 8
2089 2020-02-19 16:00:00 8
2090 2020-02-20 16:00:00 8
2091 2020-02-21 16:00:00 8
2092 2020-02-22 16:00:00 8
2093 rows × 2 columns
and when I go to plot it with these settings:
PatientTrafficPerTimeStamp = PatientTraffic.plot.bar(
x='TimeStamp',
y='count',
figsize=(20,3),
title = "Patient Traffic over Time"
)
PatientTrafficPerTimeStamp.xaxis.set_major_locator(plt.MaxNLocator(3))
I expect to get a bar chart where the x-axis has three ticks, one in the beginning middle and end...maybe I'm using this wrong. Also, it seems like the ticks that appear are simply the first 3 in the column which is not what I want. Any help would be appreciated!
回答1:
You probably think that you have one problem, but you actually have two - and both are based on the fact that you use convenience functions. The problem that you are most likely not aware of is that pandas plots bars as categorical data. This makes sense under most conditions but obviously not, if you have TimeStamp data as your x-axis. Let's see if I didn't make that up:
import matplotlib.pyplot as plt
import pandas as pd
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
#convert TS from string into datetime objects
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")
#and plot it as you do directly from pandas that provides the data to matplotlib
df.plot.bar(
x="TS",
y="Val",
ax=ax1,
title="pandas version"
)
#now plot the same data using matplotlib
ax2.bar(df.TS, df.Val, width=22)
ax2.tick_params(axis="x", labelrotation=90)
ax2.set_title("matplotlib version")
plt.tight_layout()
plt.show()
Sample output:
So, we should plot them directly from matplotlib to prevent losing the TimeStamp information. Obviously, we lose some comfort provided by pandas, e.g., we have to adjust the width of the bars and label the axes. Now, you could use the other convenience function of MaxNLocatorbut as you noticed that has been written to work well for most conditions but you give up control over the exact positioning of the ticks. Why not write our own locator using FixedLocator?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FixedLocator
import pandas as pd
def myownMaxNLocator(datacol, n):
datemin = mdates.date2num(datacol.min())
datemax = mdates.date2num(datacol.max())
xticks = np.linspace(datemin, datemax, n)
return xticks
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")
df.plot.bar(
x="TS",
y="Val",
ax=ax1,
title="pandas version"
)
ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
dateticks = myownMaxNLocator(df.TS, 5)
ax2.xaxis.set_major_locator(FixedLocator(dateticks))
ax2.tick_params(axis="x", labelrotation=90)
plt.tight_layout()
plt.show()
Sample output:
Here, the ticks start with the lowest value and end with the highest value. Alternatively, you could use the LinearLocator that distributes the ticks evenly over the entire view:
from matplotlib.ticker import LinearLocator
...
ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
ax2.xaxis.set_major_locator(LinearLocator(numticks=5))
ax2.tick_params(axis="x", labelrotation=90)
...
Sample output:
The sample data were stored in a file with the following structure:
TS Val
0 2016-03-13 12:20:00 1
1 2016-04-13 13:39:00 3
2 2016-04-03 13:43:00 5
3 2016-06-17 16:00:00 1
4 2016-09-14 13:27:00 2
2088 2017-02-08 16:00:00 7
2089 2017-02-25 16:00:00 2
2090 2018-02-20 16:00:00 8
2091 2019-02-21 16:00:00 9
2092 2020-02-22 16:00:00 8
回答2:
Have you considered grouping by date if you don't need that many xticks? Answering your question, you can make custom ticks with :
plt.xticks(ticks=[ any list ], labels=[ list of labels ])
link to documentation
来源:https://stackoverflow.com/questions/64952301/x-axis-ticks-as-dates