问题
I've got data from a simulation which gives me some values stored in a DataFrame (100 rows x 6 columns). For varying starting values I saved my data in a Panel (2 DataFrames x 100 rows x 6 columns).
Now I want to compare how the column named 'A' in both simulations (DataFrames named 'Sim1' and 'Sim2') compare and one way to do that is via the DataFrame.plot command
Panel['Sim1'].plot(x = 'xvalues', y='A')
Panel['Sim2'].plot(x = 'xvalues', y='A')
plt.show()
This works, but I feel like it somehow should be possible to plot like da data was in the same DataFrame where I could plot like this
DataFrame.plot(x = 'xvalues', y = ['A1', 'A2'])
Am I missing something or is it just impossible to simply plot the two graphs into one figure with one command if the data is stored in a Panel?
回答1:
Consider the following example:
In [77]: import pandas_datareader.data as web
In [78]: p = web.DataReader(['AAPL','GOOGL'], 'yahoo', '2017-01-01')
In [79]: p.axes
Out[79]:
[Index(['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], dtype='object'),
DatetimeIndex(['2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-09', '2017-01-10', '2017-01-11', '2017-01-12',
'2017-01-13', '2017-01-17', '2017-01-18', '2017-01-19', '2017-01-20', '2017-01-23', '2017-01-24', '2017-01-25',
'2017-01-26', '2017-01-27', '2017-01-30', '2017-01-31', '2017-02-01', '2017-02-02', '2017-02-03', '2017-02-06',
'2017-02-07', '2017-02-08', '2017-02-09', '2017-02-10', '2017-02-13', '2017-02-14', '2017-02-15', '2017-02-16',
'2017-02-17', '2017-02-21', '2017-02-22', '2017-02-23', '2017-02-24', '2017-02-27', '2017-02-28', '2017-03-01',
'2017-03-02', '2017-03-03', '2017-03-06', '2017-03-07', '2017-03-08', '2017-03-09', '2017-03-10', '2017-03-13',
'2017-03-14', '2017-03-15', '2017-03-16', '2017-03-17', '2017-03-20', '2017-03-21', '2017-03-22', '2017-03-23',
'2017-03-24', '2017-03-27', '2017-03-28', '2017-03-29', '2017-03-30', '2017-03-31', '2017-04-03', '2017-04-04',
'2017-04-05', '2017-04-06', '2017-04-07', '2017-04-10', '2017-04-11', '2017-04-12', '2017-04-13', '2017-04-17',
'2017-04-18', '2017-04-19', '2017-04-20', '2017-04-21'],
dtype='datetime64[ns]', name='Date', freq=None),
Index(['AAPL', 'GOOGL'], dtype='object')]
In [80]: p.loc['Adj Close']
Out[80]:
AAPL GOOGL
Date
2017-01-03 115.648597 808.010010
2017-01-04 115.519154 807.770020
2017-01-05 116.106611 813.020020
2017-01-06 117.401002 825.210022
2017-01-09 118.476334 827.179993
2017-01-10 118.595819 826.010010
2017-01-11 119.233055 829.859985
2017-01-12 118.735214 829.530029
2017-01-13 118.526121 830.940002
2017-01-17 119.481976 827.460022
... ... ...
2017-04-07 143.339996 842.099976
2017-04-10 143.169998 841.700012
2017-04-11 141.630005 839.880005
2017-04-12 141.800003 841.460022
2017-04-13 141.050003 840.179993
2017-04-17 141.830002 855.130005
2017-04-18 141.199997 853.989990
2017-04-19 140.679993 856.510010
2017-04-20 142.440002 860.080017
2017-04-21 142.270004 858.950012
[76 rows x 2 columns]
plot it
In [81]: p.loc['Adj Close'].plot()
Out[81]: <matplotlib.axes._subplots.AxesSubplot at 0xdabfda0>
Examples of different slicing/indexing/selecting for the sample Panel:
In [118]: p
Out[118]:
<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 76 (major_axis) x 2 (minor_axis)
Items axis: Open to Adj Close
Major_axis axis: 2017-01-03 00:00:00 to 2017-04-21 00:00:00
Minor_axis axis: AAPL to GOOGL
By items axis (index):
In [119]: p.loc['Adj Close']
Out[119]:
AAPL GOOGL
Date
2017-01-03 115.648597 808.010010
2017-01-04 115.519154 807.770020
2017-01-05 116.106611 813.020020
2017-01-06 117.401002 825.210022
2017-01-09 118.476334 827.179993
2017-01-10 118.595819 826.010010
2017-01-11 119.233055 829.859985
2017-01-12 118.735214 829.530029
2017-01-13 118.526121 830.940002
2017-01-17 119.481976 827.460022
... ... ...
2017-04-07 143.339996 842.099976
2017-04-10 143.169998 841.700012
2017-04-11 141.630005 839.880005
2017-04-12 141.800003 841.460022
2017-04-13 141.050003 840.179993
2017-04-17 141.830002 855.130005
2017-04-18 141.199997 853.989990
2017-04-19 140.679993 856.510010
2017-04-20 142.440002 860.080017
2017-04-21 142.270004 858.950012
[76 rows x 2 columns]
By major axis:
In [120]: p.loc[:, '2017-01-03']
Out[120]:
Open High Low Close Volume Adj Close
AAPL 115.800003 116.330002 114.760002 116.150002 28781900.0 115.648597
GOOGL 800.619995 811.440002 796.890015 808.010010 1959000.0 808.010010
By minor axis:
In [121]: p.loc[:, :, 'GOOGL']
Out[121]:
Open High Low Close Volume Adj Close
Date
2017-01-03 800.619995 811.440002 796.890015 808.010010 1959000.0 808.010010
2017-01-04 809.890015 813.429993 804.109985 807.770020 1515300.0 807.770020
2017-01-05 807.500000 813.739990 805.919983 813.020020 1340500.0 813.020020
2017-01-06 814.989990 828.960022 811.500000 825.210022 2017100.0 825.210022
2017-01-09 826.369995 830.429993 821.619995 827.179993 1406800.0 827.179993
2017-01-10 827.070007 829.409973 823.140015 826.010010 1194500.0 826.010010
2017-01-11 826.619995 829.900024 821.469971 829.859985 1320200.0 829.859985
2017-01-12 828.380005 830.380005 821.010010 829.530029 1349500.0 829.530029
2017-01-13 831.000000 834.650024 829.520020 830.940002 1288000.0 830.940002
2017-01-17 830.000000 830.179993 823.200012 827.460022 1439700.0 827.460022
... ... ... ... ... ... ...
2017-04-07 845.000000 845.880005 837.299988 842.099976 1110000.0 842.099976
2017-04-10 841.539978 846.739990 840.789978 841.700012 1021200.0 841.700012
2017-04-11 841.700012 844.630005 834.599976 839.880005 971900.0 839.880005
2017-04-12 838.460022 843.719971 837.590027 841.460022 1126100.0 841.460022
2017-04-13 841.039978 843.729980 837.849976 840.179993 1067200.0 840.179993
2017-04-17 841.380005 855.640015 841.030029 855.130005 1044800.0 855.130005
2017-04-18 852.539978 857.390015 851.250000 853.989990 935200.0 853.989990
2017-04-19 857.390015 860.200012 853.530029 856.510010 1077500.0 856.510010
2017-04-20 859.739990 863.929993 857.500000 860.080017 1186900.0 860.080017
2017-04-21 860.619995 862.440002 857.729980 858.950012 1168200.0 858.950012
[76 rows x 6 columns]
In your case (depending on your axes) you may want to slice your Panel differently:
Panel.loc[:, :, 'A'].plot()
回答2:
Here's one approach, using Panel.apply().
The output of apply(plt.plot)
is a minor_axis
-by-items
data frame of Line2D objects. plot()
tries to plot an additional dimension that doesn't really make sense for our purposes, but we can use lines.pop()
to remove the offending dimension. Hope this helps.
# generate sample data
x = np.arange(20)
y1 = np.random.randint(100, size=20)
y2 = np.random.randint(100, size=20)
data = {'A1': pd.DataFrame({'y':y1,'x':x}),
'A2': pd.DataFrame({'y':y2,'x':x})}
p = pd.Panel(data)
# plot panels
df = p.apply(plt.plot)
df.ix[0,0].axes.lines.pop(2)
df.ix[0,0].axes.lines.pop(0)
df.ix[0,0].axes.legend(loc="lower right")
来源:https://stackoverflow.com/questions/43519643/plotting-the-same-column-from-various-dataframes-in-a-panel