How would I get everything before a : in a string Python

前端 未结 5 1600
囚心锁ツ
囚心锁ツ 2020-11-30 20:45

I am looking for a way to get all of the letters in a string before a : but I have no idea on where to start. Would I use regex? If so how?

string = \"Userna         


        
相关标签:
5条回答
  • 2020-11-30 21:31

    You don't need regex for this

    >>> s = "Username: How are you today?"
    

    You can use the split method to split the string on the ':' character

    >>> s.split(':')
    ['Username', ' How are you today?']
    

    And slice out element [0] to get the first part of the string

    >>> s.split(':')[0]
    'Username'
    
    0 讨论(0)
  • 2020-11-30 21:34

    partition() may be better then split() for this purpose as it has the better predicable results for situations you have no delimiter or more delimiters.

    0 讨论(0)
  • 2020-11-30 21:39

    Using index:

    >>> string = "Username: How are you today?"
    >>> string[:string.index(":")]
    'Username'
    

    The index will give you the position of : in string, then you can slice it.

    If you want to use regex:

    >>> import re
    >>> re.match("(.*?):",string).group()
    'Username'                       
    

    match matches from the start of the string.

    you can also use itertools.takewhile

    >>> import itertools
    >>> "".join(itertools.takewhile(lambda x: x!=":", string))
    'Username'
    
    0 讨论(0)
  • 2020-11-30 21:42

    Just use the split function. It returns a list, so you can keep the first element:

    >>> s1.split(':')
    ['Username', ' How are you today?']
    >>> s1.split(':')[0]
    'Username'
    
    0 讨论(0)
  • 2020-11-30 21:46

    I have benchmarked these various technics under Python 3.7.0 (IPython).

    TLDR

    • fastest (when the split symbol c is known): pre-compiled regex.
    • fastest (otherwise): s.partition(c)[0].
    • safe (i.e., when c may not be in s): partition, split.
    • unsafe: index, regex.

    Code

    import string, random, re
    
    SYMBOLS = string.ascii_uppercase + string.digits
    SIZE = 100
    
    def create_test_set(string_length):
        for _ in range(SIZE):
            random_string = ''.join(random.choices(SYMBOLS, k=string_length))
            yield (random.choice(random_string), random_string)
    
    for string_length in (2**4, 2**8, 2**16, 2**32):
        print("\nString length:", string_length)
        print("  regex (compiled):", end=" ")
        test_set_for_regex = ((re.compile("(.*?)" + c).match, s) for (c, s) in test_set)
        %timeit [re_match(s).group() for (re_match, s) in test_set_for_regex]
        test_set = list(create_test_set(16))
        print("  partition:       ", end=" ")
        %timeit [s.partition(c)[0] for (c, s) in test_set]
        print("  index:           ", end=" ")
        %timeit [s[:s.index(c)] for (c, s) in test_set]
        print("  split (limited): ", end=" ")
        %timeit [s.split(c, 1)[0] for (c, s) in test_set]
        print("  split:           ", end=" ")
        %timeit [s.split(c)[0] for (c, s) in test_set]
        print("  regex:           ", end=" ")
        %timeit [re.match("(.*?)" + c, s).group() for (c, s) in test_set]
    

    Results

    String length: 16
      regex (compiled): 156 ns ± 4.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
      partition:        19.3 µs ± 430 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
      index:            26.1 µs ± 341 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      split (limited):  26.8 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      split:            26.3 µs ± 835 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      regex:            128 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    String length: 256
      regex (compiled): 167 ns ± 2.7 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
      partition:        20.9 µs ± 694 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      index:            28.6 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      split (limited):  27.4 µs ± 979 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      split:            31.5 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      regex:            148 µs ± 7.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    String length: 65536
      regex (compiled): 173 ns ± 3.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
      partition:        20.9 µs ± 613 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
      index:            27.7 µs ± 515 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      split (limited):  27.2 µs ± 796 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      split:            26.5 µs ± 377 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      regex:            128 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    String length: 4294967296
      regex (compiled): 165 ns ± 1.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
      partition:        19.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
      index:            27.7 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      split (limited):  26.1 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      split:            28.1 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      regex:            137 µs ± 6.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    0 讨论(0)
提交回复
热议问题