Python utf-8, howto align printout

前端 未结 3 2224
悲哀的现实
悲哀的现实 2021-01-02 15:20

I have a array containing japanese caracters as well as \"normal\". How do I align the printout of these?

#!/usr/bin/python
# coding=utf-8

a1=[\'する\', \'します         


        
相关标签:
3条回答
  • 2021-01-02 15:44

    Use unicode objects instead of byte strings:

    #!/usr/bin/python
    # coding=utf-8
    
    a1=[u'する', u'します', u'trazan', u'した', u'しました']
    a2=[u'dipsy', u'laa-laa', u'banarne', u'po', u'tinky winky']
    
    for i,j in zip(a1,a2):
        print i.ljust(12),':',j
    
    print '-'*8
    
    for i,j in zip(a1,a2):
        print i,len(i)
        print j,len(j)
    

    Unicode objects deal with characters directly.

    0 讨论(0)
  • 2021-01-02 15:44

    You need to manually build the string and also manually build the format length. There is no easy way for this

    The three functions below do this (needs unicodedata):

    shortenStringCJK: correctly shorten to a length for fitting in some output (not length cut for getting X characters)

    def shortenStringCJK(string, width, placeholder='..'):
    # get the length with double byte charactes
    string_len_cjk = stringLenCJK(str(string))
    # if double byte width is too big
    if string_len_cjk > width:
        # set current length and output string
        cur_len = 0
        out_string = ''
        # loop through each character
        for char in str(string):
            # set the current length if we add the character
            cur_len += 2 if unicodedata.east_asian_width(char) in "WF" else 1
            # if the new length is smaller than the output length to shorten too add the char
            if cur_len <= (width - len(placeholder)):
                out_string += char
        # return string with new width and placeholder
        return "{}{}".format(out_string, placeholder)
    else:
        return str(string)
    

    stringLenCJK: get correct length (as in space taken on a terminal)

    def stringLenCJK(string):
        # return string len including double count for double width characters
        return sum(1 + (unicodedata.east_asian_width(c) in "WF") for c in string)
    

    formatLen: format the length to adjust for width from double byte characters. without this one the length will be unbalanced.

    def formatLen(string, length):
        # returns length udpated for string with double byte characters
        # get string length normal, get string length including double byte characters
        # then subtract that from the original length
        return length - (stringLenCJK(string) - len(string))
    

    to then output some string: pre define the format string

    format_str = "|{{:<{len}}}|"
    format_len = 26
    string_len = 26
    

    and output as follows (where _string is the string to output)

    print("Normal : {}".format(
        format_str.format(
            len=formatLen(shortenStringCJK(_string, width=string_len), format_len))
        ).format(
            shortenStringCJK(_string, width=string_len)
        )
    )
    
    0 讨论(0)
  • 2021-01-02 15:50

    Using the unicodedata.east_asian_width function, keep track of which characters are narrow and wide when computing the length of the string.

    #!/usr/bin/python
    # coding=utf-8
    
    import sys
    import codecs
    import unicodedata
    
    out = codecs.getwriter('utf-8')(sys.stdout)
    
    def width(string):
        return sum(1+(unicodedata.east_asian_width(c) in "WF")
            for c in string)
    
    a1=[u'する', u'します', u'trazan', u'した', u'しました']
    a2=[u'dipsy', u'laa-laa', u'banarne', u'po', u'tinky winky']
    
    for i,j in zip(a1,a2):
        out.write('%s %s: %s\n' % (i, ' '*(12-width(i)), j))
    

    Outputs:

    する          : dipsy
    します        : laa-laa
    trazan        : banarne
    した          : po
    しました      : tinky winky
    

    It doesn’t look right in some web browser fonts, but in a terminal window they line up properly.

    0 讨论(0)
提交回复
热议问题