Python equivalent of R c() function, for dataframe column indices?

后端 未结 3 1938
面向向阳花
面向向阳花 2021-01-13 14:56

I would like to select from a pandas dataframe specific columns using column index.

In particular, I would like to select columns index by the column index generate

3条回答
  •  佛祖请我去吃肉
    2021-01-13 15:04

    To answer the actual question,

    Python equivalent of R c() function, for dataframe column indices?

    I'm using this definition of c()

    c = lambda v: v.split(',') if ":" not in v else eval(f'np.r_[{v}]')
    

    Then we can do things like:

    df = pd.DataFrame({'x': np.random.randn(1000),
                       'y': np.random.randn(1000)})
    # row selection
    df.iloc[c('2:4,7:11,21:25')] 
    
    # columns by name
    df[c('x,y')] 
    
    # columns by range
    df.T[c('12:15,17:25,500:750')]
    

    That's pretty much as close as it gets in terms of R-like syntax.

    To the curious mind

    Note there is a performance penality in using c() as per above v.s. np.r_. To paraphrase Knuth, let's not optimize prematurely ;-)

    %timeit np.r_[2:4, 7:11, 21:25]
    27.3 µs ± 786 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    %timeit c("2:4, 7:11, 21:25")
    53.7 µs ± 977 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

提交回复
热议问题