Pandas: Change day

前端 未结 2 1384
轮回少年
轮回少年 2020-12-10 02:36

I have a series in datetime format, and need to change the day to 1 for each entry. I have thought of numerous simple solutions, but none of them w

相关标签:
2条回答
  • 2020-12-10 02:47

    You can use .apply and datetime.replace, eg:

    import pandas as pd
    from datetime import datetime
    
    ps = pd.Series([datetime(2014, 1, 7), datetime(2014, 3, 13), datetime(2014, 6, 12)])
    new = ps.apply(lambda dt: dt.replace(day=1))
    

    Gives:

    0   2014-01-01
    1   2014-03-01
    2   2014-06-01
    dtype: datetime64[ns]
    
    0 讨论(0)
  • 2020-12-10 03:03

    The other answer works, but any time you use apply, you slow your code down a lot. I was able to get an 8.5x speedup by writing a quick vectorized Datetime replace for a series.

    def vec_dt_replace(series, year=None, month=None, day=None):
        return pd.to_datetime(
            {'year': series.dt.year if year is None else year,
             'month': series.dt.month if month is None else month,
             'day': series.dt.day if day is None else day})
    

    Apply:

    %timeit dtseries.apply(lambda dt: dt.replace(day=1))
    # 4.17 s ± 38.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    Vectorized:

    %timeit vec_dt_replace(dtseries, day=1)
    # 491 ms ± 6.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    Note that you could face errors by trying to change dates to ones that don't exist, like trying to change 2012-02-29 to 2013-02-29. Use the errors argument of pd.to_datetime to ignore or coerce them.

    Data generation: Generate series with 1 million random dates:

    import pandas as pd
    import numpy as np
    
    # Generate random dates. Modified from: https://stackoverflow.com/a/50668285
    def pp(start, end, n):
        start_u = start.value // 10 ** 9
        end_u = end.value // 10 ** 9
    
        return pd.Series(
            (10 ** 9 * np.random.randint(start_u, end_u, n)).view('M8[ns]'))
    
    start = pd.to_datetime('2015-01-01')
    end = pd.to_datetime('2018-01-01')
    dtseries = pp(start, end, 1000000)
    # Remove time component
    dtseries = dtseries.dt.normalize()
    
    0 讨论(0)
提交回复
热议问题