Getting correct string length in Python for strings with ANSI color codes

烂漫一生 提交于 2019-11-29 03:09:05

The pyparsing wiki includes this helpful expression for matching on ANSI escape sequences:

ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + Optional(delimitedList(integer,';')) + 
                oneOf(list(alphas)))

Here's how to make this into an escape-sequence-stripper:

from pyparsing import *

ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + Optional(delimitedList(integer,';')) + 
                oneOf(list(alphas)))

nonAnsiString = lambda s : Suppress(escapeSeq).transformString(s)

unColorString = nonAnsiString('\x1b[1m0.0\x1b[0m')
print unColorString, len(unColorString)

prints:

0.0 3
John Machin

I don't understand TWO things.

(1) It is your code, under your control. You want to add escape sequences to your data and then strip them out again so that you can calculate the length of your data?? It seems much simpler to calculate the padding before adding the escape sequences. What am I missing?

Let's presume that none of the escape sequences change the cursor position. If they do, the currently accepted answer won't work anyway.

Let's assume that you have the string data for each column (before adding escape sequences) in a list named string_data and the pre-determined column widths are in a list named width. Try something like this:

temp = []
for colx, text in enumerate(string_data):
    npad = width[colx] - len(text) # calculate padding size
    assert npad >= 0
    enhanced = fancy_text(text, colx, etc, whatever) # add escape sequences
    temp.append(enhanced + " " * npad)
sys.stdout.write("".join(temp))

Update-1

After OP's comment:

The reason I want to strip them out and calculate the length after the string contains the color codes is because all the data is built up programmatically. I have a bunch of colorize methods and I'm building up the data something like this: str = "%s/%s/%s" % (GREEN(data1), BLUE(data2), RED(data3)) It would be pretty difficult to color the text after the fact.

If the data is built up of pieces each with its own formatting, you can still compute the displayed length and pad as appropriate. Here's a function which does that for one cell's contents:

BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, WHITE = range(40, 48)
BOLD = 1

def render_and_pad(reqd_width, components, sep="/"):
    temp = []
    actual_width = 0
    for fmt_code, text in components:
        actual_width += len(text)
        strg = "\x1b[%dm%s\x1b[m" % (fmt_code, text)
        temp.append(strg)
    if temp:
        actual_width += len(temp) - 1
    npad = reqd_width - actual_width
    assert npad >= 0
    return sep.join(temp) + " " * npad

print repr(
    render_and_pad(20, zip([BOLD, GREEN, YELLOW], ["foo", "bar", "zot"]))
    )

If you think that the call is overburdened by punctuation, you could do something like:

BOLD = lambda s: (1, s)
BLACK = lambda s: (40, s)
# etc
def render_and_pad(reqd_width, sep, *components):
    # etc

x = render_and_pad(20, '/', BOLD(data1), GREEN(data2), YELLOW(data3))

(2) I don't understand why you don't want to use the supplied-with-Python regular expression kit? No "hackery" (for any possible meaning of "hackery" that I'm aware of) is involved:

>>> import re
>>> test = "1\x1b[a2\x1b[42b3\x1b[98;99c4\x1b[77;66;55d5"
>>> expected = "12345"
>>> # regex = re.compile(r"\x1b\[[;\d]*[A-Za-z]")
... regex = re.compile(r"""
...     \x1b     # literal ESC
...     \[       # literal [
...     [;\d]*   # zero or more digits or semicolons
...     [A-Za-z] # a letter
...     """, re.VERBOSE)
>>> print regex.findall(test)
['\x1b[a', '\x1b[42b', '\x1b[98;99c', '\x1b[77;66;55d']
>>> actual = regex.sub("", test)
>>> print repr(actual)
'12345'
>>> assert actual == expected
>>>

Update-2

After OP's comment:

I still prefer Paul's answer since it's more concise

More concise than what? Isn't the following regex solution concise enough for you?

# === setup ===
import re
strip_ANSI_escape_sequences_sub = re.compile(r"""
    \x1b     # literal ESC
    \[       # literal [
    [;\d]*   # zero or more digits or semicolons
    [A-Za-z] # a letter
    """, re.VERBOSE).sub
def strip_ANSI_escape_sequences(s):
    return strip_ANSI_escape_sequences_sub("", s)

# === usage ===
raw_data = strip_ANSI_escape_sequences(formatted_data)

[Above code corrected after @Nick Perkins pointed out that it didn't work]

Looking in ANSI_escape_code, the sequence in your example is Select Graphic Rendition (probably bold).

Try to control column positioning with the CUrsor Position ( CSI n ; m H) sequence. This way, width of preceding text does not affect current column position and there is no need to worry about string widths.

A better option, if you target Unix, is using the curses module window-objects. For example, a string can be positioned on the screen with:

window.addnstr([y, x], str, n[, attr])

Paint at most n characters of the string str at (y, x) with attributes attr, overwriting anything previously on the display.

If you're just adding color to some cells, you can add 9 to the expected cell width (5 hidden characters to turn on the color, 4 to turn it off), e.g.

import colorama # handle ANSI codes on Windows
colorama.init()

RED   = '\033[91m' # 5 chars
GREEN = '\033[92m' # 5 chars
RESET = '\033[0m'  # 4 chars

def red(s):
    "color a string red"
    return RED + s + RESET
def green(s):
    "color a string green"
    return GREEN + s + RESET
def redgreen(v, fmt, sign=1):
    "color a value v red or green, depending on sign of value"
    s = fmt.format(v)
    return red(s) if (v*sign)<0 else green(s)

header_format = "{:9} {:5}  {:>8}  {:10}  {:10}  {:9}  {:>8}"
row_format =    "{:9} {:5}  {:8.2f}  {:>19}  {:>19}  {:>18}  {:>17}"
print(header_format.format("Type","Trial","Epsilon","Avg Reward","Violations", "Accidents","Status"))

# some dummy data
testing = True
ntrials = 3
nsteps = 1
reward = 0.95
actions = [0,1,0,0,1]
d = {'success': True}
epsilon = 0.1

for trial in range(ntrials):
    trial_type = "Testing " if testing else "Training"
    avg_reward = redgreen(float(reward)/nsteps, "{:.2f}")
    violations = redgreen(actions[1] + actions[2], "{:d}", -1)
    accidents = redgreen(actions[3] + actions[4], "{:d}", -1)
    status = green("On time") if d['success'] else red("Late")
    print(row_format.format(trial_type, trial, epsilon, avg_reward, violations, accidents, status))

Giving

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!