问题
I have alot of data in a dictionary format and I am attempting to use pandas print a string based on an IF ELSE statement. For my example ill make up some data in dict and covert to Pandas:
df = pd.DataFrame(dict(a=[1.5,2.8,9.3],b=[7.2,3.3,4.9],c=[13.1,4.9,15.9],d=[1.1,1.9,2.9]))
df
This returns:
a b c d
0 1.5 7.2 13.1 1.1
1 2.8 3.3 4.9 1.9
2 9.3 4.9 15.9 2.9
My IF ELSE statement:
for col in df.columns:
if (df[col] < 4).any():
print('Zone %s does not make setpoint' % col)
else:
print('Zone %s is Normal' % col)
Returns:
Zone a does not make setpoint
Zone b does not make setpoint
Zone c is Normal
Zone d does not make setpoint
But now I want to add in an extra to create a box plot where I am not making setpoint and also average the data frame where it is making setpoint. I know this is pandas series, but can pandas.Series.plot.box()
be used?
This is my IF ELSE statement that I am using in a function with df.apply(lamba x:)
and I am stuck trying to get the box box plot to work in pandas series... Any advice is greatly appreciated!
import matplotlib.pyplot as plt
def _print(x):
if (x < 4).any():
print('Zone %s does not make setpoint' % x.name)
df.boxplot()
plt.show()
else:
print('Zone %s is Normal' % x.name)
print('The average is %s' % x.mean())
Im getting an error when I am calling df.apply(lambda x: _print(x))
module 'matplotlib' has no attribute 'show'
回答1:
Sure you can call pandas.Series.plot.box()
like df['a'].plot.box()
to get the boxplot of your column a
.
To fit with your question I would have done this:
def _print(x):
if (x < 4).any():
print('Zone %s does not make setpoint' % x.name)
df[x.name].plot.box() #call x.name to retrieve the column name
plt.show()
print(df[x.name].describe())
else:
print('Zone %s is Normal' % x.name)
print('The average is %s' % x.mean())
print('---')
df.apply(lambda x: _print(x))
Illustrated below extract of the output for zone B
and zone C
.
Note that you can add .describe()
to get the boxplot and other stats description (see documentation).
Nevertheless I would have approach the problem differently, according to the solution proposed here.
Another solution
You can filter your dataframe to split into make setpoint or not:
s = df.apply(lambda x: not (x < 4).any())
Then plot the boxes on the one that doesn't make the set point.
Plot all in a figure if the variation is not too large, and if there is not so many zones:
df[s[~s].index].boxplot()
plt.show()
Or separate them:
for col in s[~s].index:
df[col].plot.box()
plt.show()
In both case get the statistics in a dataframe
:
statdf = df[s[~s].index].describe()
print(statdf)
a b d
count 3.000000 3.000000 3.000000
mean 4.533333 5.133333 1.966667
std 4.178915 1.960442 0.901850
min 1.500000 3.300000 1.100000
25% 2.150000 4.100000 1.500000
50% 2.800000 4.900000 1.900000
75% 6.050000 6.050000 2.400000
max 9.300000 7.200000 2.900000
This way you can get the stat (say 'mean
' for instance) with statdf.loc['mean']
.
If you want to print the mean of the one that does make the set point:
print(df[s[s].index].mean())
c 11.3
Name: mean, dtype: float64
回答2:
I don't really know if that is what you are looking for, but... you are asking :
I want to add in an extra to create a box plot
You are trying this using... df.Series.plot.box()
, which outputs the error AttributeError: 'DataFrame' object has no attribute 'Series'
.
Try using instead df.boxplot(), which will then show at each plt.show()
call...

来源:https://stackoverflow.com/questions/48908828/python-pandas-series-if-else-box-plot