问题
I have a data set that has employees clocking in and out. It looks like this (note two entries per employee):
Employee Date Time
Emp1 1/1/16 06:00
Emp1 1/1/16 13:00
Emp2 1/1/16 09:00
Emp2 1/1/16 17:00
Emp3 1/1/16 11:00
Emp3 1/1/16 18:00
I want to get the data to look like this:
Employee Date Start End
Emp1 1/1/16 06:00 13:00
Emp2 1/1/16 09:00 17:00
Emp3 1/1/16 11:00 18:00
I would like to get it into a data frame format so that I can do some calculations.
I currently have tried
df['start'] = np.where((df['employee']==df['employee']&df['date']==df['date']),df['time'].min())
I also tried:
df.groupby(['employee','date]['time'].max()
How do I get two columns out of one?
回答1:
I would recommend to merge Date and Time into one column as DateTime. That would greatly simplify your work. You can do something like this:
df['DateTime']=pd.to_datetime(df['Date']+" "+df['Time'])
df.groupby('Employee')['DateTime'].agg([min, max])
There are other options depending the content of your data. If you know that all the entries will be on the same day you can simply do:
# First convert Date and Time columns to DateTime type
df['Date'] = pd.to_datetime(df['Date']).dt.date
df['Time'] = pd.to_datetime(df['Time']).dt.time
df.groupby('Employee').agg([min, max])
no need to create a DateTime column in this case.
If you want to know Start End times per each day you can do:
# First convert Date and Time columns to DateTime type
df['Date'] = pd.to_datetime(df['Date']).dt.date
df['Time'] = pd.to_datetime(df['Time']).dt.time
df.groupby(['Employee','Date'])['Time'].agg([min, max])
来源:https://stackoverflow.com/questions/40747795/finding-start-time-and-end-time-in-a-column