Convert a column in pandas of HH:MM to minutes

落爺英雄遲暮 提交于 2019-12-23 12:55:22

问题


I want to convert a column in dataset of hh:mm format to minutes. I tried the following code but it says " AttributeError: 'Series' object has no attribute 'split' ". The data is in following format. I also have nan values in the dataset and the plan is to compute the median of values and then fill the rows which has nan with the median

02:32
02:14
02:31
02:15
02:28
02:15
02:22
02:16
02:22
02:14

I have tried this so far

 s = dataset['Enroute_time_(hh mm)']

   hours, minutes = s.split(':')
   int(hours) * 60 + int(minutes)

回答1:


I suggest you avoid row-wise calculations. You can use a vectorised approach with Pandas / NumPy:

df = pd.DataFrame({'time': ['02:32', '02:14', '02:31', '02:15', '02:28', '02:15', 
                            '02:22', '02:16', '02:22', '02:14', np.nan]})

values = df['time'].fillna('00:00').str.split(':', expand=True).astype(int)
factors = np.array([60, 1])

df['mins'] = (values * factors).sum(1)

print(df)

     time  mins
0   02:32   152
1   02:14   134
2   02:31   151
3   02:15   135
4   02:28   148
5   02:15   135
6   02:22   142
7   02:16   136
8   02:22   142
9   02:14   134
10    NaN     0



回答2:


If you want to use split you will need to use the str accessor, ie s.str.split(':').

However I think that in this case it makes more sense to use apply:

df = pd.DataFrame({'Enroute_time_(hh mm)': ['02:32', '02:14', '02:31', 
                                            '02:15', '02:28', '02:15', 
                                            '02:22', '02:16', '02:22', '02:14']})

def convert_to_minutes(value):
    hours, minutes = value.split(':')
    return int(hours) * 60 + int(minutes)

df['Enroute_time_(hh mm)'] = df['Enroute_time_(hh mm)'].apply(convert_to_minutes)
print(df)

#       Enroute_time_(hh mm)
#    0                   152
#    1                   134
#    2                   151
#    3                   135
#    4                   148
#    5                   135
#    6                   142
#    7                   136
#    8                   142
#    9                   134



回答3:


I understood that you have a column in a DataFrame with multiple Timedeltas as Strings. Then you want to extract the total minutes of the Deltas. After that you want to fill the NaN values with the median of the total minutes.

import pandas as pd
df = pd.DataFrame(
     {'hhmm' : ['02:32',
                '02:14',
                '02:31',
                '02:15',
                '02:28',
                '02:15',
                '02:22',
                '02:16',
                '02:22',
                '02:14']})
  1. Your Timedeltas are not Timedeltas. They are strings. So you need to convert them first.

    df.hhmm = pd.to_datetime(df.hhmm, format='%H:%M')
    df.hhmm = pd.to_timedelta(df.hhmm - pd.datetime(1900, 1, 1))
    

    This gives you the following values (Note the dtype: timedelta64[ns] here)

    0   02:32:00
    1   02:14:00
    2   02:31:00
    3   02:15:00
    4   02:28:00
    5   02:15:00
    6   02:22:00
    7   02:16:00
    8   02:22:00
    9   02:14:00
    Name: hhmm, dtype: timedelta64[ns]
    
  2. Now that you have true timedeltas, you can use some cool functions like total_seconds() and then calculate the minutes.

    df.hhmm.dt.total_seconds() / 60
    
  3. If that is not what you wanted, you can also use the following.

    df.hhmm.dt.components.minutes
    

    This gives you the minutes from the HH:MM string as if you would have split it.

  4. Fill the na-values.

     df.hhmm.fillna((df.hhmm.dt.total_seconds() / 60).mean())
    

    or

    df.hhmm.fillna(df.hhmm.dt.components.minutes.mean())
    


来源:https://stackoverflow.com/questions/53098627/convert-a-column-in-pandas-of-hhmm-to-minutes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!