Date issue with scatter and LinearRegression

ⅰ亾dé卋堺 提交于 2021-02-08 07:29:54

问题


I have two issues and I believe both are released to the date format.

I have a cvs with dates and values:

2012-01-03 00:00:00     95812    
2012-01-04 00:00:00    101265 
... 
2016-10-21 00:00:00     93594

after i load it with read_csv I'm trying to parse the date with:

X.Dated = pd.to_datetime(X.Dated, format='%Y-%m-%d %H:%M:%S', errors='raise')

I also tried with:

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
X = pd.read_csv('sales.csv', parse_dates=['Dated'], date_parser=dateparse)

and infer_datetime_format argument.

All of them seems to work fine because when I print it out the date looks like: 2012-01-03.

The issue appears when I'm trying to plot the data on chart, this line:

ax.scatter(X.Dated, X.Val, c='green', marker='.')

gives me an error:

TypeError: invalid type promotion

Also when I try to use it with LinearRegression() algorithm the fit command works fine but the score and predict gives me this error:

TypeError: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'

I tried many things to fix it but with no luck. Any help would be appreciated.


回答1:


ax.scatter (at the moment) does not accept Pandas Series, but it can accept a list of Pandas Timestamps (e.g. X['Dated'].tolist()), or NumPy array of dtype datetime64[ns] (e.g. X['Dated'].values):

import pandas as pd
import matplotlib.pyplot as plt

X = pd.DataFrame({'Dated': [pd.Timestamp('2012-01-03 00:00:00'),
                            pd.Timestamp('2012-01-04 00:00:00'),
                            pd.Timestamp('2016-10-21 00:00:00')],
                  'Val': [95812, 101265, 93594]})

fig, ax = plt.subplots()
# ax.scatter(X['Dated'].tolist(), X['Val'], c='green', marker='.', s=200)
ax.scatter(X['Dated'].values, X['Val'], c='green', marker='.', s=200)
plt.show()


Under the hood, the ax.scatter method calls

x = self.convert_xunits(x)
y = self.convert_yunits(y)

to handle date-like inputs. convert_xunits converts NumPy datetime64 arrays to Matplotlib datenums, but it converts Pandas timeseries to NumPy datetime64 array.

So, when a Pandas timeseries is passed as input to ax.scatter, the code ends up failing when this line is reached:

offsets = np.dstack((x, y))

np.dstack tries to promote the dtypes of its inputs to one common dtype. If x has dtype datetime64[ns] and y has dtype float64, then

TypeError: invalid type promotion

is raised since there is no native NumPy dtype which is compatible with both.



来源:https://stackoverflow.com/questions/40220076/date-issue-with-scatter-and-linearregression

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!