问题
I have two issues and I believe both are released to the date format.
I have a cvs with dates and values:
2012-01-03 00:00:00 95812
2012-01-04 00:00:00 101265
...
2016-10-21 00:00:00 93594
after i load it with read_csv
I'm trying to parse the date with:
X.Dated = pd.to_datetime(X.Dated, format='%Y-%m-%d %H:%M:%S', errors='raise')
I also tried with:
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
X = pd.read_csv('sales.csv', parse_dates=['Dated'], date_parser=dateparse)
and infer_datetime_format
argument.
All of them seems to work fine because when I print it out the date looks like: 2012-01-03
.
The issue appears when I'm trying to plot the data on chart, this line:
ax.scatter(X.Dated, X.Val, c='green', marker='.')
gives me an error:
TypeError: invalid type promotion
Also when I try to use it with LinearRegression() algorithm the fit command works fine but the score and predict gives me this error:
TypeError: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'
I tried many things to fix it but with no luck. Any help would be appreciated.
回答1:
ax.scatter
(at the moment) does not accept Pandas Series, but it can accept a list of Pandas Timestamps (e.g. X['Dated'].tolist()
), or NumPy array of dtype datetime64[ns]
(e.g. X['Dated'].values
):
import pandas as pd
import matplotlib.pyplot as plt
X = pd.DataFrame({'Dated': [pd.Timestamp('2012-01-03 00:00:00'),
pd.Timestamp('2012-01-04 00:00:00'),
pd.Timestamp('2016-10-21 00:00:00')],
'Val': [95812, 101265, 93594]})
fig, ax = plt.subplots()
# ax.scatter(X['Dated'].tolist(), X['Val'], c='green', marker='.', s=200)
ax.scatter(X['Dated'].values, X['Val'], c='green', marker='.', s=200)
plt.show()
Under the hood, the ax.scatter method calls
x = self.convert_xunits(x)
y = self.convert_yunits(y)
to handle date-like inputs. convert_xunits
converts NumPy datetime64 arrays to Matplotlib datenums, but it converts Pandas timeseries to NumPy datetime64 array.
So, when a Pandas timeseries is passed as input to ax.scatter
, the code ends up failing when this line is reached:
offsets = np.dstack((x, y))
np.dstack
tries to promote the dtypes of its inputs to one common dtype. If x
has dtype datetime64[ns]
and y
has dtype float64
, then
TypeError: invalid type promotion
is raised since there is no native NumPy dtype which is compatible with both.
来源:https://stackoverflow.com/questions/40220076/date-issue-with-scatter-and-linearregression