Generating and applying diffs in python

前端 未结 6 1336
野性不改
野性不改 2020-12-25 11:35

Is there an \'out-of-the-box\' way in python to generate a list of differences between two texts, and then applying this diff to one file to obtain the other, later?

<
6条回答
  •  南笙
    南笙 (楼主)
    2020-12-25 12:07

    I've implemented a pure python function to apply diff patches to recover either of the input strings, I hope someone finds it useful. It uses parses the Unified diff format.

    import re
    
    _hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$")
    
    def apply_patch(s,patch,revert=False):
      """
      Apply unified diff patch to string s to recover newer string.
      If revert is True, treat s as the newer string, recover older string.
      """
      s = s.splitlines(True)
      p = patch.splitlines(True)
      t = ''
      i = sl = 0
      (midx,sign) = (1,'+') if not revert else (3,'-')
      while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines
      while i < len(p):
        m = _hdr_pat.match(p[i])
        if not m: raise Exception("Cannot process diff")
        i += 1
        l = int(m.group(midx))-1 + (m.group(midx+1) == '0')
        t += ''.join(s[sl:l])
        sl = l
        while i < len(p) and p[i][0] != '@':
          if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2
          else: line = p[i]; i += 1
          if len(line) > 0:
            if line[0] == sign or line[0] == ' ': t += line[1:]
            sl += (line[0] != sign)
      t += ''.join(s[sl:])
      return t
    

    If there are header lines ("--- ...\n","+++ ...\n") it skips over them. If we have a unified diff string diffstr representing the diff between oldstr and newstr:

    # recreate `newstr` from `oldstr`+patch
    newstr = apply_patch(oldstr, diffstr)
    # recreate `oldstr` from `newstr`+patch
    oldstr = apply_patch(newstr, diffstr, True)
    

    In Python you can generate a unified diff of two strings using difflib (part of the standard library):

    import difflib
    _no_eol = "\ No newline at end of file"
    
    def make_patch(a,b):
      """
      Get unified string diff between two strings. Trims top two lines.
      Returns empty string if strings are identical.
      """
      diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0)
      try: _,_ = next(diffs),next(diffs)
      except StopIteration: pass
      return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])
    

    On unix: diff -U0 a.txt b.txt

    Code is on GitHub here along with tests using ASCII and random unicode characters: https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc

提交回复
热议问题