I\'m running into an issue that I hope is simple, however I\'ve run into a wall trying to figure it out. I\'m attempting to strip the DateTime timestamp from the beginning
If I get what you're attempting to do right, you can just use a regex to extract the word/sentence afterwards:
import re
regex = re.compile(r'(?:\s*\[.*?\])(.*)')
sentence = regex.findall(line)[0].strip()
Note that I have omitted the verification that you had in your regex, you can still use it.
The method str.strip will remove all characters from the beginning and end of the string that are in the argument. You probably want to use str.replace instead.
>>> line = '[Wed Dec 01 10:24:24 2010] testc'
>>> line.replace('[Wed Dec 01 10:24:24 2010]', '')
' testc'
You can get rid of the leading white space by using str.lstrip, or use str.strip
if you want to get rid of trailing white space too (the default arguments are white space).
b
is '[Wed Dec 01 10:24:24 2010]'
so then you strip any of the characters that are in b from c
so everything bar ct
get removed:
'[Wed Dec 01 10:24:24 2010] ceeeeest'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# all in [Wed Dec 01 10:24:24 2010]
So only st
remain as they are the only two characters no in b
, strip will keep stripping from both ends until it hits char not in the set:
In [3]: s = "fooboaroof"
In [4]: s.strip("foo")
Out[4]: 'boar'
If the date is always at the start which it must be if you are using match, when you get a match the simplest would be to split:
line2 = '[Wed Dec 01 10:24:24 2010] ceeeeest'
print(line2.split("] ", 1)[1])
Or:
print(line2[len(a.group()):].lstrip())
If you really want to strip
(that is, discard) the date and time information, and if the information is in the format you represent, try this:
#! python3
lines = [
'[Wed Dec 01 10:24:24 2010] ceeeeest',
'[Wed Dec 01 10:24:24 2010] testc',
'just a plain old line',
' indented',
' with [brackets]',
'[BOGUS! This should be disallowed!',
'[][][] Three pairs',
]
for line in lines:
if line.startswith('['):
try:
line = line[line.index(']')+2:]
except ValueError:
print('Invalid formatting: open [ with no close!')
else:
print(line)
else:
print('Ho hum, nothing interesting about:', line)
if you have repeat items with same pattern in your string, you can use regex find all the match then replace it to empty string
import re
pattern = r'\[\w{3} \w{3} \d{2} \d{2}:\d{2}:\d{2} \d{4}\] '
for p in re.findall(pattern,line):
line = line.replace(p,'')
As others have pointed out, you are using strip
incorrectly. Instead, since you already have matching working, slice off the number of characters from the start of the string.
result = line[:len(a.group())]
print(result)
# prints ' testc'