Generating and applying diffs in python

前端 未结 6 1322
野性不改
野性不改 2020-12-25 11:35

Is there an \'out-of-the-box\' way in python to generate a list of differences between two texts, and then applying this diff to one file to obtain the other, later?

<
相关标签:
6条回答
  • 2020-12-25 11:45

    AFAIK most diff algorithms use a simple Longest Common Subsequence match, to find the common part between two texts and whatever is left is considered the difference. It shouldn't be too difficult to code up your own dynamic programming algorithm to accomplish that in python, the wikipedia page above provides the algorithm too.

    0 讨论(0)
  • 2020-12-25 11:48

    Probably you can use unified_diff to generate the list of difference in a file. Only the changed texts in your file can be written it into a new text file where you can use it for your future reference. This is the code which helps you to write only the difference to your new file. I hope this is what you are asking for !

    diff = difflib.unified_diff(old_file, new_file, lineterm='')
        lines = list(diff)[2:]
        # linesT = list(diff)[0:3]
        print (lines[0])
        added = [lineA for lineA in lines if lineA[0] == '+']
    
    
        with open("output.txt", "w") as fh1:
         for line in added:
           fh1.write(line)
        print '+',added
        removed = [lineB for lineB in lines if lineB[0] == '-']
        with open("output.txt", "a") as fh1:
         for line in removed:
           fh1.write(line)
        print '-',removed 
    

    Use this in your code to save only the difference output !

    0 讨论(0)
  • 2020-12-25 11:49

    Does it have to be a python solution?
    My first thoughts as to a solution would be to use either a Version Control System (Subversion, Git, etc.) or the diff / patch utilities that are standard with a unix system, or are part of cygwin for a windows based system.

    0 讨论(0)
  • 2020-12-25 12:07

    I've implemented a pure python function to apply diff patches to recover either of the input strings, I hope someone finds it useful. It uses parses the Unified diff format.

    import re
    
    _hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$")
    
    def apply_patch(s,patch,revert=False):
      """
      Apply unified diff patch to string s to recover newer string.
      If revert is True, treat s as the newer string, recover older string.
      """
      s = s.splitlines(True)
      p = patch.splitlines(True)
      t = ''
      i = sl = 0
      (midx,sign) = (1,'+') if not revert else (3,'-')
      while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines
      while i < len(p):
        m = _hdr_pat.match(p[i])
        if not m: raise Exception("Cannot process diff")
        i += 1
        l = int(m.group(midx))-1 + (m.group(midx+1) == '0')
        t += ''.join(s[sl:l])
        sl = l
        while i < len(p) and p[i][0] != '@':
          if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2
          else: line = p[i]; i += 1
          if len(line) > 0:
            if line[0] == sign or line[0] == ' ': t += line[1:]
            sl += (line[0] != sign)
      t += ''.join(s[sl:])
      return t
    

    If there are header lines ("--- ...\n","+++ ...\n") it skips over them. If we have a unified diff string diffstr representing the diff between oldstr and newstr:

    # recreate `newstr` from `oldstr`+patch
    newstr = apply_patch(oldstr, diffstr)
    # recreate `oldstr` from `newstr`+patch
    oldstr = apply_patch(newstr, diffstr, True)
    

    In Python you can generate a unified diff of two strings using difflib (part of the standard library):

    import difflib
    _no_eol = "\ No newline at end of file"
    
    def make_patch(a,b):
      """
      Get unified string diff between two strings. Trims top two lines.
      Returns empty string if strings are identical.
      """
      diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0)
      try: _,_ = next(diffs),next(diffs)
      except StopIteration: pass
      return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])
    

    On unix: diff -U0 a.txt b.txt

    Code is on GitHub here along with tests using ASCII and random unicode characters: https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc

    0 讨论(0)
  • 2020-12-25 12:09

    Did you have a look at diff-match-patch from google? Apparantly google Docs uses this set of algoritms. It includes not only a diff module, but also a patch module, so you can generate the newest file from older files and diffs.

    A python version is included.

    http://code.google.com/p/google-diff-match-patch/

    0 讨论(0)
  • 2020-12-25 12:10

    Does difflib.unified_diff do want you want? There is an example here.

    0 讨论(0)
提交回复
热议问题