Pandas Series not plotting to timeseries chart

风格不统一 提交于 2019-12-13 04:14:08

问题


I have a data set of house prices - House Price Data. When I use a subset of the data in a Numpy array, I can plot it in this nice timeseries chart:

However, when I use the same data in a Panda Series, the chart goes all lumpy like this:

How can I create a smooth time series line graph (like the first image) using a Panda Series?

Here is what I am doing to get the nice looking time series chart (using Numpy array)(after importing numpy as np, pandas as pd and matplotlib.pyplot as plt):

data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True) #pull in csv file, make index the date column and parse the dates
brixton = data[data['RegionName'] == 'Lambeth'] # pull out a subset for the region Lambeth
prices = brixton['AveragePrice'].values # create a numpy array of the average price values
plt.plot(prices) #plot
plt.show() #show

Here is what I am doing to get the lumpy one using a Panda series:

data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice'] 
plt.plot(prices_panda)
plt.show()

How do I make this second graph show as a nice smooth proper time series?

* This is my first StackOverflow question so please shout if I have left anything out or not been clear *

Any help greatly appreciated


回答1:


The date format in the file you have is Day/Month/Year. In order for pandas to interprete this format correctly you can use the option dayfirst=True inside the read_csv call.

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('data/UK-HPI-full-file-2017-08.csv', 
                   index_col='Date', parse_dates=True, dayfirst=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice'] 
plt.plot(prices_panda)
plt.show()




回答2:


When you did parse_dates=True, pandas read the dates in its default method, which is month-day-year. Your data is formatted according to the British convention, which is day-month-year. As a result, instead of having a data point for the first of every month, your plot is showing data points for the first 12 days of January, and a flat line for the rest of each year. You need to reformat the dates, such as

data.index = pd.to_datetime({'year':data.index.year,'month':data.index.day,'day':data.index.month})



来源:https://stackoverflow.com/questions/47355526/pandas-series-not-plotting-to-timeseries-chart

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!