Python str.strip() with regex filtering unexpected characters

前端未结

关注

 6  831

I\'m running into an issue that I hope is simple, however I\'ve run into a wall trying to figure it out. I\'m attempting to strip the DateTime timestamp from the beginning

相关标签:

6条回答

无人共我

2020-12-19 23:18
If I get what you're attempting to do right, you can just use a regex to extract the word/sentence afterwards:
```
import re
regex = re.compile(r'(?:\s*\[.*?\])(.*)')
sentence = regex.findall(line)[0].strip()
```
Note that I have omitted the verification that you had in your regex, you can still use it.
0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2020-12-19 23:22
The method str.strip will remove all characters from the beginning and end of the string that are in the argument. You probably want to use str.replace instead.
```
>>> line = '[Wed Dec 01 10:24:24 2010] testc'
>>> line.replace('[Wed Dec 01 10:24:24 2010]', '')
' testc'
```
You can get rid of the leading white space by using str.lstrip, or use str.strip if you want to get rid of trailing white space too (the default arguments are white space).
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2020-12-19 23:22
b is '[Wed Dec 01 10:24:24 2010]' so then you strip any of the characters that are in b from c so everything bar ct get removed:
```
'[Wed Dec 01 10:24:24 2010] ceeeeest'
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   
 # all in [Wed Dec 01 10:24:24 2010]
```
So only st remain as they are the only two characters no in b, strip will keep stripping from both ends until it hits char not in the set:
```
In [3]: s = "fooboaroof"

In [4]: s.strip("foo")
Out[4]: 'boar'
```
If the date is always at the start which it must be if you are using match, when you get a match the simplest would be to split:
```
line2 = '[Wed Dec 01 10:24:24 2010] ceeeeest'

print(line2.split("] ", 1)[1])
```
Or:
```
 print(line2[len(a.group()):].lstrip())
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

闹比i

2020-12-19 23:26

If you really want to strip (that is, discard) the date and time information, and if the information is in the format you represent, try this:

#! python3

lines = [
    '[Wed Dec 01 10:24:24 2010] ceeeeest',
    '[Wed Dec 01 10:24:24 2010] testc',
    'just a plain old line',
    '       indented',
    '      with [brackets]',
    '[BOGUS! This should be disallowed!',
    '[][][] Three pairs',
]

for line in lines:
    if line.startswith('['):
        try:
            line = line[line.index(']')+2:]
        except ValueError:
            print('Invalid formatting: open [ with no close!')
        else:
            print(line)
    else:
        print('Ho hum, nothing interesting about:', line)

0 讨论(0)

盖世英雄少女心

2020-12-19 23:28
if you have repeat items with same pattern in your string, you can use regex find all the match then replace it to empty string
```
import re
pattern = r'\[\w{3} \w{3} \d{2} \d{2}:\d{2}:\d{2} \d{4}\] '
for p in re.findall(pattern,line):
   line = line.replace(p,'')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
伪装坚强ぢ

2020-12-19 23:32
As others have pointed out, you are using strip incorrectly. Instead, since you already have matching working, slice off the number of characters from the start of the string.
```
result = line[:len(a.group())]
print(result)
# prints ' testc'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...