Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date

后端 未结 5 865
执念已碎
执念已碎 2020-12-01 09:22

I have a DataFrame with column named date. How can we convert/parse the \'date\' column to a DateTime object?

I loaded the dat

相关标签:
5条回答
  • 2020-12-01 09:36

    pandas already reads that as a datetime object! So what you want is to select rows between two dates and you can do that by masking:

    df_masked = df[(df.date > '2012-04-01') & (df.date < '2012-04-04')]
    

    Because you said that you were getting an error from the string for some reason, try this:

    df_masked = df[(df.date > datetime.date(2012,4,1)) & (df.date < datetime.date(2012,4,4))]
    
    0 讨论(0)
  • 2020-12-01 09:36

    Don't confuse datetime.date with Pandas pd.Timestamp

    A "Pandas datetime series" contains pd.Timestamp elements, not datetime.date elements. The recommended solution for Pandas:

    s = pd.to_datetime(s)    # convert series to Pandas
    mask = s > '2018-03-10'  # calculate Boolean mask against Pandas-compatible object
    

    The top answers have issues:

    • @RyanSaxe's accepted answer's first attempt doesn't work; the second answer is inefficient.
    • As of Pandas v0.23.0, @Keith's highly upvoted answer doesn't work; it gives TypeError.

    Any good Pandas solution must ensure:

    1. The series is a Pandas datetime series, not object dtype.
    2. The datetime series is compared to a compatible object, e.g. pd.Timestamp, or string in the correct format.

    Here's a demo with benchmarking, demonstrating that the one-off cost of conversion can be immediately offset by a single operation:

    from datetime import date
    
    L = [date(2018, 1, 10), date(2018, 5, 20), date(2018, 10, 30), date(2018, 11, 11)]
    s = pd.Series(L*10**5)
    
    a = s > date(2018, 3, 10)             # accepted solution #2, inefficient
    b = pd.to_datetime(s) > '2018-03-10'  # more efficient, including datetime conversion
    
    assert a.equals(b)                    # check solutions give same result
    
    %timeit s > date(2018, 3, 10)                  # 40.5 ms
    %timeit pd.to_datetime(s) > '2018-03-10'       # 33.7 ms
    
    s = pd.to_datetime(s)
    
    %timeit s > '2018-03-10'                       # 2.85 ms
    
    0 讨论(0)
  • 2020-12-01 09:50

    You should iterate over the items and parse them independently, then construct a new list.

    df['date'] = [dateutil.parser.parse(x) for x in df['date']]
    
    0 讨论(0)
  • 2020-12-01 09:56

    Pandas is aware of the object datetime but when you use some of the import functions it is taken as a string. So what you need to do is make sure the column is set as the datetime type not as a string. Then you can make your query.

    df['date']  = pd.to_datetime(df['date'])
    df_masked = df[(df['date'] > datetime.date(2012,4,1)) & (df['date'] < datetime.date(2012,4,4))]
    
    0 讨论(0)
  • 2020-12-01 09:59

    You probably need apply, so something like:

    df['date'] = df['date'].apply(dateutil.parser.parse)
    

    Without an example of the column I can't guarantee this will work, but something in that direction should help you to carry on.

    0 讨论(0)
提交回复
热议问题