问题
I'm very new to pandas data frame that has a date time column, and a column that contains a string of text (headlines). Each headline will be a new row.
I need to plot the date on the x-axis, and the y-axis needs to contain how many times a headline occurs on each date.
So for example, one date may contain 3 headlines.
What's the simplest way to do this? I can't figure out how to do it at all. Maybe add another column with a '1' for each row? If so, how would you do this?
Please point me in the direction of anything that may help!
Thanks you!
I have tried plotting the count on the y, but keep getting errors, I tried creating a variable that counts the number of rows, but that didn't return anything of use either.
I tried add a column with the count of headlines
df_data['headline_count'] = df_data['headlines'].count
and I tried the group by method
df_data['count'] = df.groupby('headlines')['headlines'].transform('count')
When I use groupie, i get an error of
KeyError: 'headlines'
The output should simply be a plot with how many times a date is repeated in the dataframe (which signals that there are multiple headlines) in the rows plotted on the y-axis. And the x-axis should be the date that the observations occurred.
回答1:
Try this:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
A = pd.DataFrame(columns=["Date", "Headlines"], data=[["01/03/2018","Cricket"],["01/03/2018","Football"],
["02/03/2018","Football"],["01/03/2018","Football"],
["02/03/2018","Cricket"],["02/03/2018","Cricket"]] )
Your data looks like this:
print (A)
Date Headlines
0 01/03/2018 Cricket
1 01/03/2018 Football
2 02/03/2018 Football
3 01/03/2018 Football
4 02/03/2018 Cricket
5 02/03/2018 Cricket
Now do a group by operation on it:
data = A.groupby(["Date","Headlines"]).size()
print(data)
Date Headlines
01/03/2018 Cricket 1
Football 2
02/03/2018 Cricket 2
Football 1
dtype: int64
You can now plot it using the below code:
# set width of bar
barWidth = 0.25
# set height of bar
bars1 = data.loc[(data.index.get_level_values('Headlines') =="Cricket")].values
bars2 = data.loc[(data.index.get_level_values('Headlines') =="Football")].values
# Set position of bar on X axis
r1 = np.arange(len(bars1))
r2 = [x + barWidth for x in r1]
# Make the plot
plt.bar(r1, bars1, color='#7f6d5f', width=barWidth, edgecolor='white', label='Cricket')
plt.bar(r2, bars2, color='#557f2d', width=barWidth, edgecolor='white', label='Football')
# Add xticks on the middle of the group bars
plt.xlabel('group', fontweight='bold')
plt.xticks([r + barWidth for r in range(len(bars1))], data.index.get_level_values('Date').unique())
# Create legend & Show graphic
plt.legend()
plt.xlabel("Date")
plt.ylabel("Count")
plt.show()
Hope this helps!
回答2:
Use Series.value_counts with date
column for Series
with Series.sort_index or GroupBy.size:
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-10','2019-10-09']),
'col1':['a','b','c']})
s = df['date'].value_counts().sort_index()
#alternative
#s = df.groupby('date').size()
print (s)
2019-10-09 1
2019-10-10 2
Name: date, dtype: int64
And last use Series.plot:
s.plot()
回答3:
Have you tried this:
df2 = df_data.groupby(['headlines']).count()
You should save the results of this in a new data frame (df2) and not in another column because the result of the groupby wont have the same dimensions of the original data frame.
来源:https://stackoverflow.com/questions/58320398/plotting-the-count-of-occurrences-per-date