How to find the first index of any of a set of characters in a string

后端 未结 4 1449
梦毁少年i
梦毁少年i 2020-12-20 11:44

I\'d like to find the index of the first occurrence of any “special” character in a string, like so:

>>> \"Hello world!\".index([\' \', \'!\'])
5


        
相关标签:
4条回答
  • 2020-12-20 12:20

    Use a gen-exp and the find method.

    >>> a = [' ', '!']
    >>> s = "Hello World!"
    >>> min(s.find(i) for i in a)
    5
    

    To remove -1s if they occur, you can have a filter within the list comp

    >>> a = [' ', '!','$']
    >>> s = "Hello World!"
    >>> min(s.find(i) for i in a if i in s)
    5
    

    or you can substitute None

    >>> min(s.find(i) if i in s else None for i in a)
    5
    

    Adding the timeit results

    $ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(s.find(i) for i in a if i in s)"
    1000000 loops, best of 3: 0.902 usec per loop
    $ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';next((i for i, ch  in enumerate(s) if ch in a),None)"
    1000000 loops, best of 3: 1.25 usec per loop
    $ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(map(lambda x: (s.index(x) if (x in s) else len(s)), a))"
    1000000 loops, best of 3: 1.12 usec per loop
    

    In your Example case, Padraic's beautiful solution is a little slow. However in large test cases, It is definitely a winner. (It's a little surprising that alfasin's "Not as optimized" is also faster here)

    Added the implementation details

    >>> def take1(s,a):
    ...     min(s.find(i) for i in a if i in s)
    ... 
    >>> import dis
    >>> dis.dis(take1)
      2           0 LOAD_GLOBAL              0 (min)
                  3 LOAD_CLOSURE             0 (s)
                  6 BUILD_TUPLE              1
                  9 LOAD_CONST               1 (<code object <genexpr> at 0x7fa622e961b0, file "<stdin>", line 2>)
                 12 MAKE_CLOSURE             0
                 15 LOAD_FAST                1 (a)
                 18 GET_ITER            
                 19 CALL_FUNCTION            1
                 22 CALL_FUNCTION            1
                 25 POP_TOP             
                 26 LOAD_CONST               0 (None)
                 29 RETURN_VALUE        
    >>> def take2(s,a):
    ...     next((i for i, ch  in enumerate(s) if ch in a),None)
    ... 
    >>> dis.dis(take2)
      2           0 LOAD_GLOBAL              0 (next)
                  3 LOAD_CLOSURE             0 (a)
                  6 BUILD_TUPLE              1
                  9 LOAD_CONST               1 (<code object <genexpr> at 0x7fa622e96e30, file "<stdin>", line 2>)
                 12 MAKE_CLOSURE             0
                 15 LOAD_GLOBAL              1 (enumerate)
                 18 LOAD_FAST                0 (s)
                 21 CALL_FUNCTION            1
                 24 GET_ITER            
                 25 CALL_FUNCTION            1
                 28 LOAD_CONST               0 (None)
                 31 CALL_FUNCTION            2
                 34 POP_TOP             
                 35 LOAD_CONST               0 (None)
                 38 RETURN_VALUE        
    >>> def take3(s,a):
    ...     min(map(lambda x: (s.index(x) if (x in s) else len(s)), a))
    ... 
    >>> dis.dis(take3)
      2           0 LOAD_GLOBAL              0 (min)
                  3 LOAD_GLOBAL              1 (map)
                  6 LOAD_CLOSURE             0 (s)
                  9 BUILD_TUPLE              1
                 12 LOAD_CONST               1 (<code object <lambda> at 0x7fa622e44eb0, file "<stdin>", line 2>)
                 15 MAKE_CLOSURE             0
                 18 LOAD_FAST                1 (a)
                 21 CALL_FUNCTION            2
                 24 CALL_FUNCTION            1
                 27 POP_TOP             
                 28 LOAD_CONST               0 (None)
                 31 RETURN_VALUE        
    

    As you can clearly see in Padraic's case the loading of the global functions next and enumerate are the ones which are killing the time along with None at the end. In alfasin's solution the primary slowing down is the lambda function.

    0 讨论(0)
  • 2020-12-20 12:22

    Not as optimized as Padraic Cunningham's solution, but still a one liner:

    string = "Hello world!"
    specials = [' ', '!', 'x']
    min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
    
    0 讨论(0)
  • 2020-12-20 12:26

    I would favour the re module as it's built in and already tested. It's also optimised for exactly this kind of thing.

    >>> import re
    >>> re.search(r'[ !]', 'Hello World!').start()
    5
    

    You probably want to check that a match was found though or catch the exception when it's not.

    There are reasons for not using re, but I would want to see a good comment justifying the rational. Thinking that you can "do it better" is generally unnecessary, makes it harder for others to read the code and is less maintainable.

    0 讨论(0)
  • 2020-12-20 12:27

    You can use enumerate and next with a generator expression, getting the first match or returning None if no character appears in s:

    s = "Hello world!"
    
    st = {"!"," "}
    ind = next((i for i, ch  in enumerate(s) if ch in st),None)
    print(ind)
    

    You can pass any value you want to next as a default return value if there is no match.

    If you want to use a function and raise a ValueError:

    def first_index(s, characters):
        st = set(characters)
        ind = next((i for i, ch in enumerate(s) if ch in st), None)
        if ind is not None:
            return ind
        raise ValueError
    

    For smaller inputs using a set won't make much if any difference but for large strings it will be a more efficient.

    Some timings:

    In the string, last character of character set:

    In [40]: s = "Hello world!" * 100    
    In [41]: string = s    
    In [42]: %%timeit
    st = {"x","y","!"}
    next((i for i, ch in enumerate(s) if ch in st), None)
       ....: 
    1000000 loops, best of 3: 1.71 µs per loop    
    In [43]: %%timeit
    specials = ['x', 'y', '!']
    min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
       ....: 
    100000 loops, best of 3: 2.64 µs per loop
    

    Not in the string, larger character set:

    In [44]: %%timeit
    st = {"u","v","w","x","y","z"}
    next((i for i, ch in enumerate(s) if ch in st), None)
       ....: 
    1000000 loops, best of 3: 1.49 µs per loop
    
    In [45]: %%timeit
    specials = ["u","v","w","x","y","z"]
    min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
       ....: 
    100000 loops, best of 3: 5.48 µs per loop
    

    In string an very first character of character set:

    In [47]: %%timeit
    specials = ['H', 'y', '!']
    min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
       ....: 
    100000 loops, best of 3: 2.02 µs per loop
    
    In [48]: %%timeit
    st = {"H","y","!"}
    next((i for i, ch in enumerate(s) if ch in st), None)
       ....: 
    1000000 loops, best of 3: 903 ns per loop
    
    0 讨论(0)
提交回复
热议问题