How to strip color codes used by mIRC users?

前端未结

关注

 6  1524

I\'m writing a IRC bot in Python using irclib and I\'m trying to log the messages on certain channels.
The issue is that some mIRC users and some Bots write using color

相关标签:

6条回答

粉色の甜心

2020-12-16 06:51
I even had to add '\x0f', whatever use it has
```
regex = re.compile("\x0f|\x1f|\x02|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)
regex.sub('', msg)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-16 06:55

AutoDl-irssi had a very good one written in perl, here it is in python:

def stripMircColorCodes(line) : line = re.sub("\x03\d\d?,\d\d?","",line) line = re.sub("\x03\d\d?","",line) line = re.sub("[\x01-\x1F]","",line) return line

0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2020-12-16 07:06
As I found this question useful, I figured I'd contribute.

I added a couple things to the regex
```
regex = re.compile("\x1f|\x02|\x03|\x16|\x0f(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)
```
\x16 removed the "reverse" character. \x0f gets rid of another bold character.
0 讨论(0)
发布评论:

提交评论
- 加载中...

逝去的感伤

2020-12-16 07:06

I know I posted wanting a regex solution because it could be cleaner, I have created a non regex solution that works perfect.

def colourstrip(data):
    find = data.find('\x03')
    while find > -1:
        done = False
        data = data[0:find] + data[find+1:]
        if len(data) <= find+1:
            done = True
        try:
            assert int(data[find])
            data = data[0:find] + data[find+1:]
        except:
            done = True
        try:
            assert not done
            assert int(data[find])
            data = data[0:find] + data[find+1:]
        except:
            if not done and (data[find] != ','):
                done = True
        if (len(data) > find+1) and (data[find] == ','):
            try:
                assert not done
                assert int(data[find+1])
                data = data[0:find] + data[find+1:]
                data = data[0:find] + data[find+1:]
            except:
                done = True
            try:
                assert not done
                assert int(data[find])
                data = data[0:find] + data[find+1:]
            except: pass

        find = data.find('\x03')
    data = data.replace('\x1d','')
    data = data.replace('\x1f','')
    data = data.replace('\x16','')
    data = data.replace('\x0f','')
    return data

datastring = '\x0312,4This is coolour \x032,4This is too\x03'    
print(colourstrip(datastring))

Thank you for all the help everyone.

0 讨论(0)

温柔的废话

2020-12-16 07:11
Regular expressions are your cleanest bet in my opinion. If you haven't used them before, this is a good resource. For the full details on Python's regex library, go here.
```
import re
regex = re.compile("\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)
```
The regex searches for ^C (which is \x03 in ASCII, you can confirm by doing chr(3) on the command line), and then optionally looks for one or two [0-9] characters, then optionally followed by a comma and then another one or two [0-9] characters.

(?: ... ) says to forget about storing what was found in the parenthesis (as we don't need to backreference it), ? means to match 0 or 1 and {n,m} means to match n to m of the previous grouping. Finally, \d means to match [0-9].

The rest can be decoded using the links I refer to above.
```
>>> regex.sub("", "blabla \x035,12to be colored text and background\x03 blabla")
'blabla to be colored text and background blabla'
```
chaos' solution is similar, but may end up eating more than a max of two numbers and will also not remove any loose ^C characters that may be hanging about (such as the one that closes the colour command)
0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2020-12-16 07:12
The second-rated and following suggestions are defective, as they look for digits after whatever character, but not after the color code character.

I have improved and combined all posts, with the following consequences:
- we do remove the reverse character
- remove color codes without leaving digits in the text.
Solution:

regex = re.compile("\x1f|\x02|\x12|\x0f|\x16|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)
0 讨论(0)
发布评论:

提交评论
- 加载中...