Python difflib: highlighting differences inline?

陌路散爱 提交于 2019-11-28 16:52:50

For your simple example:

import difflib
def show_diff(seqm):
    """Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
    output= []
    for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
        if opcode == 'equal':
            output.append(seqm.a[a0:a1])
        elif opcode == 'insert':
            output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
        elif opcode == 'delete':
            output.append("<del>" + seqm.a[a0:a1] + "</del>")
        elif opcode == 'replace':
            raise NotImplementedError, "what to do with 'replace' opcode?"
        else:
            raise RuntimeError, "unexpected opcode"
    return ''.join(output)

>>> sm= difflib.SequenceMatcher(None, "lorem ipsum dolor sit amet", "lorem foo ipsum dolor amet")
>>> show_diff(sm)
'lorem<ins> foo</ins> ipsum dolor <del>sit </del>amet'

This works with strings. You should decide what to do with "replace" opcodes.

difflib.SequenceMatcher will operate on single lines. You can use the "opcodes" to determine how to change the first line to make it the second line.

Here's an inline differ inspired by @tzot's answer above (also Python 3 compatible)

def inline_diff(a, b):
    import difflib
    matcher = difflib.SequenceMatcher(None, a, b)
    def process_tag(tag, i1, i2, j1, j2):
        if tag == 'replace':
            return '{' + matcher.a[i1:i2] + ' -> ' + matcher.b[j1:j2] + '}'
        if tag == 'delete':
            return '{- ' + matcher.a[i1:i2] + '}'
        if tag == 'equal':
            return matcher.a[i1:i2]
        if tag == 'insert':
            return '{+ ' + matcher.b[j1:j2] + '}'
        assert false, "Unknown tag %r"%tag
    return ''.join(process_tag(*t) for t in matcher.get_opcodes())

It's not perfect - for example, it would be nice to expand 'replace' opcodes to recognize the full word replaced instead of just the few different letters, but it's a good place to start.

Sample output:

>>> a='Lorem ipsum dolor sit amet consectetur adipiscing'
>>> b='Lorem bananas ipsum cabbage sit amet adipiscing'
>>> print(inline_diff(a, b))
Lorem{+  bananas} ipsum {dolor -> cabbage} sit amet{-  consectetur} adipiscing
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!