efficiently checking that string consists of one character in Python

后端 未结 8 1309
太阳男子
太阳男子 2020-11-29 01:03

What is an efficient way to check that a string s in Python consists of just one character, say \'A\'? Something like all_equal(s, \'A\')

8条回答
  •  独厮守ぢ
    2020-11-29 01:53

    >>> s = 'AAAAAAAAAAAAAAAAAAA'
    >>> s.count(s[0]) == len(s)
    True
    

    This doesn't short circuit. A version which does short-circuit would be:

    >>> all(x == s[0] for x in s)
    True
    

    However, I have a feeling that due the the optimized C implementation, the non-short circuiting version will probably perform better on some strings (depending on size, etc)


    Here's a simple timeit script to test some of the other options posted:

    import timeit
    import re
    
    def test_regex(s,regex=re.compile(r'^(.)\1*$')):
        return bool(regex.match(s))
    
    def test_all(s):
        return all(x == s[0] for x in s)
    
    def test_count(s):
        return s.count(s[0]) == len(s)
    
    def test_set(s):
        return len(set(s)) == 1
    
    def test_replace(s):
        return not s.replace(s[0],'')
    
    def test_translate(s):
        return not s.translate(None,s[0])
    
    def test_strmul(s):
        return s == s[0]*len(s)
    
    tests = ('test_all','test_count','test_set','test_replace','test_translate','test_strmul','test_regex')
    
    print "WITH ALL EQUAL"
    for test in tests:
        print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="AAAAAAAAAAAAAAAAA"'%test)
        if globals()[test]("AAAAAAAAAAAAAAAAA") != True:
            print globals()[test]("AAAAAAAAAAAAAAAAA")
            raise AssertionError
    
    print
    print "WITH FIRST NON-EQUAL"
    for test in tests:
        print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="FAAAAAAAAAAAAAAAA"'%test)
        if globals()[test]("FAAAAAAAAAAAAAAAA") != False:
            print globals()[test]("FAAAAAAAAAAAAAAAA")
            raise AssertionError
    

    On my machine (OS-X 10.5.8, core2duo, python2.7.3) with these contrived (short) strings, str.count smokes set and all, and beats str.replace by a little, but is edged out by str.translate and strmul is currently in the lead by a good margin:

    WITH ALL EQUAL
    test_all 5.83863711357
    test_count 0.947771072388
    test_set 2.01028490067
    test_replace 1.24682998657
    test_translate 0.941282987595
    test_strmul 0.629556179047
    test_regex 2.52913498878
    
    WITH FIRST NON-EQUAL
    test_all 2.41147494316
    test_count 0.942595005035
    test_set 2.00480484962
    test_replace 0.960338115692
    test_translate 0.924381017685
    test_strmul 0.622269153595
    test_regex 1.36632800102
    

    The timings could be slightly (or even significantly?) different between different systems and with different strings, so that would be worth looking into with an actual string you're planning on passing.

    Eventually, if you hit the best case for all enough, and your strings are long enough, you might want to consider that one. It's a better algorithm ... I would avoid the set solution though as I don't see any case where it could possibly beat out the count solution.

    If memory could be an issue, you'll need to avoid str.translate, str.replace and strmul as those create a second string, but this isn't usually a concern these days.

提交回复
热议问题