What is an efficient way to check that a string s
in Python consists of just one character, say \'A\'
? Something like all_equal(s, \'A\')
>>> s = 'AAAAAAAAAAAAAAAAAAA'
>>> s.count(s[0]) == len(s)
True
This doesn't short circuit. A version which does short-circuit would be:
>>> all(x == s[0] for x in s)
True
However, I have a feeling that due the the optimized C implementation, the non-short circuiting version will probably perform better on some strings (depending on size, etc)
Here's a simple timeit
script to test some of the other options posted:
import timeit
import re
def test_regex(s,regex=re.compile(r'^(.)\1*$')):
return bool(regex.match(s))
def test_all(s):
return all(x == s[0] for x in s)
def test_count(s):
return s.count(s[0]) == len(s)
def test_set(s):
return len(set(s)) == 1
def test_replace(s):
return not s.replace(s[0],'')
def test_translate(s):
return not s.translate(None,s[0])
def test_strmul(s):
return s == s[0]*len(s)
tests = ('test_all','test_count','test_set','test_replace','test_translate','test_strmul','test_regex')
print "WITH ALL EQUAL"
for test in tests:
print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="AAAAAAAAAAAAAAAAA"'%test)
if globals()[test]("AAAAAAAAAAAAAAAAA") != True:
print globals()[test]("AAAAAAAAAAAAAAAAA")
raise AssertionError
print
print "WITH FIRST NON-EQUAL"
for test in tests:
print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="FAAAAAAAAAAAAAAAA"'%test)
if globals()[test]("FAAAAAAAAAAAAAAAA") != False:
print globals()[test]("FAAAAAAAAAAAAAAAA")
raise AssertionError
On my machine (OS-X 10.5.8, core2duo, python2.7.3) with these contrived (short) strings, str.count
smokes set
and all
, and beats str.replace
by a little, but is edged out by str.translate
and strmul
is currently in the lead by a good margin:
WITH ALL EQUAL
test_all 5.83863711357
test_count 0.947771072388
test_set 2.01028490067
test_replace 1.24682998657
test_translate 0.941282987595
test_strmul 0.629556179047
test_regex 2.52913498878
WITH FIRST NON-EQUAL
test_all 2.41147494316
test_count 0.942595005035
test_set 2.00480484962
test_replace 0.960338115692
test_translate 0.924381017685
test_strmul 0.622269153595
test_regex 1.36632800102
The timings could be slightly (or even significantly?) different between different systems and with different strings, so that would be worth looking into with an actual string you're planning on passing.
Eventually, if you hit the best case for all
enough, and your strings are long enough, you might want to consider that one. It's a better algorithm ... I would avoid the set
solution though as I don't see any case where it could possibly beat out the count
solution.
If memory could be an issue, you'll need to avoid str.translate
, str.replace
and strmul
as those create a second string, but this isn't usually a concern these days.