问题
So this has been answered here: making output of datefinder into list Unfortunately my rep is too low so I can't comment to get clarity on why it's not functioning as expected. I want to take filename strings and turn them into a list of dates so that I can then use them as a fill for the date column. All of the filenames include event dates, but they are not on the sheets themselves. The format is: CompanyNameEventLocationDDMMYYYY.xlsx
import glob
import datefinder
#get all Excel files within folder
path = r"C:\Users\me\Documents\Events\Spreadsheets\Consolidated\*.xlsx"
filename = glob.glob(path)
#get dates from filenames
event_dates = (datefinder.find_dates(f) for f in filename)
#check output
for days in event_dates:
print(days.strftime("%Y-%m-%d %H:%M:%S"))
Instead of getting converted objects I get the following error: AttributeError:
'generator' object has no attribute 'strftime'
When I try converting it directly to a list with
date_list = list(event_dates)
print(event_dates)
I still get output saying that it's still a generator object:
<generator object <genexpr> at 0x00000230571AE660>
What do I need to change so that it actually becomes a list of datetimes that I can use to populate an appended 'Date' column by forward filling?
回答1:
Your code does not work because the find_dates
returns you a generator and events date contains a list of generator. It could work with the chain.from_iterable
like following
import glob
import datefinder
from itertools import chain
#get all Excel files within folder
path = r"C:\Users\me\Documents\Events\Spreadsheets\Consolidated\*.xlsx"
filename = glob.glob(path)
#get dates from filenames
event_dates = (datefinder.find_dates(f) for f in filename)
#check output
for days in chain.from_iterable(event_dates):
print(days.strftime("%Y-%m-%d %H:%M:%S"))
But I try to use datefinder on a filename with your format and it does not work well. I think it is not designed for this.
event_dates = datefinder.find_dates('afilename_01-08-2019.xls')
for match in event_dates:
print(match)
event_dates = datefinder.find_dates('afilename01082019.xls')
for match in event_dates:
print(match)
this previous code prints:
2019-01-08 00:00:00
If your format is always CompanyNameEventLocationDDMMYYYY.xlsx
, you should parse day, month and year number and give them to a date
object:
import datetime
import glob
#get all Excel files within folder
path = r"C:\Users\me\Documents\Events\Spreadsheets\Consolidated\*.xlsx"
filenames = glob.glob(path)
for filename in filenames:
full_date = filename.split('.')[0]
day, month, year = full_date[-8:-6], full_date[-6:-4], full_date[-4:]
real_date = datetime.date(int(year), int(month), int(day))
print(real_date)
回答2:
Ended up getting it working by making these changes:
import pandas as pd
import glob
import datefinder
from datetime import datetime
#get all Excel files within folder
path = r"C:\Users\me\Documents\Events\Spreadsheets\Consolidated\*.xlsx"
filename = glob.glob(path)
#get dates from filenames and convert to datetime objects
event_dates = (datefinder.find_dates(f) for f in filename)
event_dates_dto = []
for dates in event_dates:
event_dates_dto.append(pd.to_datetime(list(dates)))
#check output
print(event_dates_dto)
This now creates a list of DateTimeIndexes extracted from each filename which can be used as needed for the forward date column I need to construct, for indexing, etc.
来源:https://stackoverflow.com/questions/57794275/how-do-i-turn-datefinder-output-into-a-list