How can I slice each element of a numpy array of strings?

前端 未结 4 2024
面向向阳花
面向向阳花 2020-12-01 16:25

Numpy has some very useful string operations, which vectorize the usual Python string operations.

Compared to these operation and to pandas.str, the num

4条回答
  •  被撕碎了的回忆
    2020-12-01 17:02

    Here's a vectorized approach -

    def slicer_vectorized(a,start,end):
        b = a.view((str,1)).reshape(len(a),-1)[:,start:end]
        return np.fromstring(b.tostring(),dtype=(str,end-start))
    

    Sample run -

    In [68]: a = np.array(['hello', 'how', 'are', 'you'])
    
    In [69]: slicer_vectorized(a,1,3)
    Out[69]: 
    array(['el', 'ow', 're', 'ou'], 
          dtype='|S2')
    
    In [70]: slicer_vectorized(a,0,3)
    Out[70]: 
    array(['hel', 'how', 'are', 'you'], 
          dtype='|S3')
    

    Runtime test -

    Testing out all the approaches posted by other authors that I could run at my end and also including the vectorized approach from earlier in this post.

    Here's the timings -

    In [53]: # Setup input array
        ...: a = np.array(['hello', 'how', 'are', 'you'])
        ...: a = np.repeat(a,10000)
        ...: 
    
    # @Alberto Garcia-Raboso's answer
    In [54]: %timeit slicer(1, 3)(a)
    10 loops, best of 3: 23.5 ms per loop
    
    # @hapaulj's answer
    In [55]: %timeit np.frompyfunc(lambda x:x[1:3],1,1)(a)
    100 loops, best of 3: 11.6 ms per loop
    
    # Using loop-comprehension
    In [56]: %timeit np.array([i[1:3] for i in a])
    100 loops, best of 3: 12.1 ms per loop
    
    # From this post
    In [57]: %timeit slicer_vectorized(a,1,3)
    1000 loops, best of 3: 787 µs per loop
    

提交回复
热议问题