Python: find a series of Chinese characters within a string and apply a function

前端 未结 4 2015
遥遥无期
遥遥无期 2021-01-14 06:55

I\'ve got a series of text that is mostly English, but contains some phrases with Chinese characters. Here\'s two examples:

s1 = \"You say: 你好. I say: 再見\"
s         


        
4条回答
  •  醉酒成梦
    2021-01-14 07:50

    Regular expression Match objects give you the start and end indexes of a match. So, instead of findall, do your own search and record the indexes as you go. Then, you can translate each extent and replace in the string based on the known indexes of the phrases.

    import re
    
    _scan_chinese_re = re.compile(r'[\u4e00-\u9fff]+')
    
    s1 = "You say: 你好. I say: 再見"
    s2 = "答案, my friend, 在風在吹"
    
    def translator(chinese_text):
        """My no good translator"""
        return ' '.join('??' for _ in chinese_text)
    
    def scanner(text):
        """Scan text string, translate chinese and return copy"""
        print('----> text:', text)
    
        # list of extents where chinese text is found
        chinese_inserts = [] # [start, end]
    
        # keep scanning text to end
        index = 0
        while index < len(text):
            m = _scan_chinese_re.search(text[index:])
            if not m:
                break
            # get extent from match object and add to list
            start = index + m.start()
            end = index + m.end()
            print('chinese at index', start, text[start:end])
            chinese_inserts.append([start, end])
            index += end
    
        # copy string and replace backwards so we don't mess up indexes
        copy = list(text)
        while chinese_inserts:
            start, end = chinese_inserts.pop()
            copy[start:end] = translator(text[start:end])
        text = ''.join(copy)
        print('final', text)
        return text
    
    scanner(s1)
    scanner(s2)
    

    With my questionable translator, the result is

    ----> text: You say: 你好. I say: 再見
    chinese at index 9 你好
    chinese at index 20 再見
    final You say: ?? ??. I say: ?? ??
    ----> text: 答案, my friend, 在風在吹
    chinese at index 0 答案
    chinese at index 15 在風在吹
    final ?? ??, my friend, ?? ?? ?? ??
    

提交回复
热议问题