问题
I've been trying to get some posted similar solutions to work with no luck.
I am trying to get histograms for Cost
for all the Step No
in our manufacturing process. There are a different number of steps for each part, so I want to have a set of histograms on one plot/image for each part.
In my real data there are many parts so if this could loop through many parts and save the graphs that would be ideal.
Additionally we have a target cost for each step that I want to overlay on the histogram. This is represented in a separate dataframe. I got stuck on the loop for the subplots so I didn't try this yet.
Here's as close as to what I can find for what each step histogram should look like:
Here is my code so far:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Dist_Example.xlsx')
df1 = df[~df['Cost Type'].isin(['Material'])]
number_of_subplots = len(df1['Step No'].unique())
steps = df1['Step No'].unique()
fig, axs = plt.subplots(1, number_of_subplots, sharey = True, tight_layout=True)
for step in steps:
df2 = df1[df1['Step No'].isin([step])]
axs[step].hist(df2['Cost'])
plt.show()
Thank you in advance for helping me!
Here is the Target Cost
I'd like to be shown as vertical line on the histogram:
PartNo StepNo TargetCost
ABC 10 12
ABC 20 20
ABC 30 13
Here's some sample historical data which should be in bins in the histogram:
PartNo SerialNo StepNo CostType Cost
ABC 123 10 Labor 11
ABC 123 10 Material 16
ABC 456 10 Labor 21
ABC 456 10 Material 26
ABC 789 10 Labor 21
ABC 789 10 Material 16
ABC 1011 10 Labor 11
ABC 1011 10 Material 6
ABC 1112 10 Labor 1
ABC 1112 10 Material -4
ABC 123 20 Labor 11
ABC 123 20 Material 19
ABC 456 20 Labor 24
ABC 456 20 Material 29
ABC 789 20 Labor 24
ABC 789 20 Material 19
ABC 1011 20 Labor 14
ABC 1011 20 Material 9
ABC 1112 20 Labor 4
ABC 1112 20 Material -1
ABC 123 30 Labor 11
ABC 123 30 Material 13
ABC 456 30 Labor 18
ABC 456 30 Material 23
ABC 789 30 Labor 18
ABC 789 30 Material 13
ABC 1011 30 Labor 8
ABC 1011 30 Material 3
ABC 1112 30 Labor -2
ABC 1112 30 Material -7
And a second sample dataset:
PartNo SerialNo StepNo CostType Cost
DEF Aplha 10 Labor 2
DEF Zed 10 Labor 3
DEF Kelly 10 Labor 4
DEF Aplha 20 Labor 3
DEF Zed 20 Labor 2
DEF Kelly 20 Labor 5
DEF Aplha 30 Labor 6
DEF Zed 30 Labor 7
DEF Kelly 30 Labor 5
DEF Aplha 40 Labor 3
DEF Zed 40 Labor 4
DEF Kelly 40 Labor 2
DEF Aplha 50 Labor 8
DEF Zed 50 Labor 9
DEF Kelly 50 Labor 7
回答1:
You won't find a histogram function that solves this directly for your dataset. You'll need to aggregate the data in a way that suits your needs, and then represent your findings with a bar plot.
I find your objective and data a bit confusing, but I think I've figured out what you're after given these assumptions:
- You want to aggregate the cost per StepNo
- Cost type is irrelevant
- Total Target cost will have to be calculated since you are aggregating all costs within each StepNo.
The plot
EDIT
This is not what OP was looking for. After a bit of back and forth we found a solution that seemed to work
(from the question) I am trying to get histograms for Cost for all the Step No
(from a comment) I actually want to have a historgram for the sum of the cost per serial no in each step.
Since you've got to have count
or frequency on the y-axis in a histogram, you will have to aggregate the data in some way that makes sense. Below you'll see the count for a bin number of choice for aggregated costs of each SerialNO at each step.
Result:
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import pylab
# Load data in two steps:
# df1 = pd.read_clipboard(sep='\\s+')
# Part No Serial No Step No Cost Type Cost
# ABC 123 10 Labor 11
# ABC 123 10 Material 16
# ABC 456 10 Labor 21
# ABC 456 10 Material 26
# ...
# df2 = pd.read_clipboard(sep='\\s+')
# Part No Step No Target Cost
# ABC 10 12
# ABC 20 20
# ABC 30 13
# Cost type and SerialNo irrelevant
df11 = df1.drop(['CostType'] , axis = 1)
# Aggregate by StepNo, find total cost and count
##df12 = df11.groupby(['PartNo', 'StepNo']).agg(['sum', 'count']).reset_index()
df12 = df11.groupby(['PartNo', 'StepNo', 'SerialNo']).agg(['sum', 'count']).reset_index()
df12.columns = ['PartNo', 'StepNo', 'SerialNo', 'Cost', 'Count']
df3 = pd.merge(df2, df12, how = 'left', on = ['PartNo', 'StepNo'])
# Calculate total target cost
df3['TargetTotal'] = df3['TargetCost']*df3['Count']
# pylab.rcParams['figure.figsize'] = (2, 1)
def multiHist(x_data, x_label, bins):
# Hisrogram setup
fig, ax = plt.subplots()
ax.hist(x_data, bins=bins, color='blue', alpha=0.5, histtype='stepfilled')
# Horizontal line
x0 = dfs['TargetTotal'].iloc[0]
ax.axvline(x0, color='red', linewidth=2)
# Annotation
ax.annotate('Target: {:0.2f}'.format(x0), xy=(x0, 1), xytext=(-15, 15),
xycoords=('data', 'axes fraction'), textcoords='offset points',
horizontalalignment='left', verticalalignment='center',
arrowprops=dict(arrowstyle='-|>', fc='white', shrinkA=0, shrinkB=0,
connectionstyle='angle,angleA=0,angleB=90,rad=10'),)
# Labels
ax.set_xlabel(x_label, color = 'grey')
ax.legend(loc='upper left')
plt.show()
# Identify and plot data for each StepNo
for step in df3['StepNo'].unique():
dfs = df3[df3['StepNo']==step]
# Data to plot
cost = dfs['Cost']
labels = 'Part: ' + dfs['PartNo'].iloc[0] + ', ' 'Step:' + str(dfs['StepNo'].iloc[0])
# Plot
multiHist(x_data = cost, x_label = labels, bins = 4)
来源:https://stackoverflow.com/questions/52112092/dynamic-histogram-subplots-with-line-to-mark-target