Basically, if I have a line of text which starts with indention, what\'s the best way to grab that indention and put it into a variable in Python? For example, if the line i
A sneaky way: abuse lstrip
!
fullstr = "\t\tthis line has two tabs of indentation"
startwhites = fullstr[:len(fullstr)-len(fullstr.lstrip())]
This way you don't have to work through all the details of whitespace!
(Thanks Adam for the correction)
How about using the regex \s*
which matches any whitespace characters. You only want the whitespace at the beginning of the line so either search
with the regex ^\s*
or simply match
with \s*
.
If you're interested in using regular expressions you can use that. /\s/
usually matches one whitespace character, so /^\s+/
would match the whitespace starting a line.
This can also be done with str.isspace
and itertools.takewhile
instead of regex.
import itertools
tests=['\t\tthis line has two tabs of indention',
' this line has four spaces of indention']
def indention(astr):
# Using itertools.takewhile is efficient -- the looping stops immediately after the first
# non-space character.
return ''.join(itertools.takewhile(str.isspace,astr))
for test_string in tests:
print(indention(test_string))
def whites(a):
return a[0:a.find(a.strip())]
Basically, the my idea is:
import re
s = "\t\tthis line has two tabs of indention"
re.match(r"\s*", s).group()
// "\t\t"
s = " this line has four spaces of indention"
re.match(r"\s*", s).group()
// " "
And to strip leading spaces, use lstrip.
As there are down votes probably questioning the efficiency of regex, I've done some profiling to check the efficiency of each cases.
RegEx > Itertools >> lstrip
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*")s=" hello world!"*10000', number=100000)
0.10037684440612793
>>> timeit.timeit('"".join(itertools.takewhile(lambda x:x.isspace(),s))', 'import itertools;s=" hello world!"*10000', number=100000)
0.7092740535736084
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" hello world!"*10000', number=100000)
0.51730513572692871
>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" hello world!"*10000', number=100000)
2.6478431224822998
lstrip > RegEx > Itertools
If you can limit the string's length to thousounds of chars or less, the lstrip trick maybe better.
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" hello world!"*100', number=100000)
0.099548101425170898
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" hello world!"*100', number=100000)
0.53602385520935059
>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" hello world!"*100', number=100000)
0.064291000366210938
This shows the lstrip trick scales roughly as O(√n) and the RegEx and itertool methods are O(1) if the number of leading spaces is not a lot.
lstrip >> RegEx >>> Itertools
If there are a lot of leading spaces, don't use RegEx.
>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" "*2000', number=10000)
0.047424077987670898
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" "*2000', number=10000)
0.2433168888092041
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" "*2000', number=10000)
3.9949162006378174
lstrip >>> RegEx >>>>>>>> Itertools
>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" "*200000', number=10000)
4.2374031543731689
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" "*200000', number=10000)
23.877214908599854
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" "*200000', number=100)*100
415.72158336639404
This shows all methods scales roughly as O(m) if the non-space part is not a lot.