filter dataframe in pandas by a date column

爷,独闯天下 提交于 2020-01-06 08:14:27

问题


The data is in the following link : http://www.fdic.gov/bank/individual/failed/banklist.html

I want only the banks which closed in 2017. How can I do it in Pandas ?

failed_banks= pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
failed_banks[0]

What should I do after these lines of code to extract the desired result?


回答1:


Ideally you would use

# assuming pandas successfully parsed this column as datetime object
# and pandas version >= 0.16
failed_banks= pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')[0]
failed_banks = failed_banks[failed_banks['Closing Date'].dt.year == 2017]

But pandas doesn't correctly parses the Closing Date as date objects, so we need to parse it ourselves:

failed_banks = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')[0]

def parse_date_strings(date_str):
    return int(date_str.split(', ')[-1]) == 2017

failed_banks = failed_banks[failed_banks['Closing Date'].apply(parse_date_strings)]



回答2:


Something like this should work

Extract closing year.

# using pd.to_datetime
closing_year = pd.to_datetime(failed_banks[0]['Updated Date']).apply(lambda x: x.year)
# or by splitting the line
closing_year = failed_banks[0]['Updated Date'].apply(lambda x: x.split(', ')[1])

And select.

failed_banks[0][closing_year=='2017']


来源:https://stackoverflow.com/questions/45613501/filter-dataframe-in-pandas-by-a-date-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!