Pandas - Fast way of accessing a column of objects' attribute

后端 未结 2 883
囚心锁ツ
囚心锁ツ 2021-01-13 12:29

Let\'s say I have a custom class in python, that has the attribute val. If I have a pandas dataframe with a column of these objects, how can I access this attri

相关标签:
2条回答
  • 2021-01-13 12:42

    You could use a list comprehension:

    df['custom_val'] = [foo.val for foo in df['custom_object']]
    

    Timings

    # Set-up 100k Foo objects.
    vals = [np.random.randn() for _ in range(100000)]
    foos = [Foo(val) for val in vals]
    df = pd.DataFrame(foos, columns=['custom_object'])
    
    # 1) OP's apply method.
    %timeit df['custom_object'].apply(lambda x: x.val)
    # 10 loops, best of 3: 26.7 ms per loop
    
    # 2) Using a list comprehension instead.
    %timeit [foo.val for foo in df['custom_object']]
    # 100 loops, best of 3: 11.7 ms per loop
    
    # 3) For reference with the original list of objects (slightly faster than 2) above).
    %timeit [foo.val for foo in foos]
    # 100 loops, best of 3: 9.79 ms per loop
    
    # 4) And just on the original list of raw values themselves.
    %timeit [val for val in vals]
    # 100 loops, best of 3: 4.91 ms per loop
    

    If you had the original list of values, you could just assign them directly:

    # 5) Direct assignment to list of values.
    %timeit df['v'] = vals
    # 100 loops, best of 3: 5.88 ms per loop
    
    0 讨论(0)
  • 2021-01-13 12:44

    Setup code:

    import operator
    import random
    from dataclasses import dataclass
    
    import numpy as np
    import pandas as pd
    
    
    @dataclass
    class SomeObj:
        val: int
    
    
    df = pd.DataFrame(data={f"col_1": [SomeObj(random.randint(0, 10000)) for _ in range(10000000)]})
    

    Solution 1

    df['col_1'].map(lambda elem: elem.val)
    

    Time: ~3.2 seconds

    Solution 2

    df['col_1'].map(operator.attrgetter('val'))
    

    Time: ~2.7 seconds

    Solution 3

    [elem.val for elem in df['col_1']]
    

    Time: ~1.4 seconds

    Note: Keep in mind that this solution produces a different result type, which may be an issue in certain situations.


    0 讨论(0)
提交回复
热议问题