What's the best way to turn a Subversion diff into JSON?

问题

I have a bunch of Sed/unix fu, that I'm begining to suspect isn't going to be the best way to complete the task, given the variance of lines coming out of 'svn diff' ...

svn diff -r 1:9 | 
expand | 
sed -e 's/^Index: \(.*\)/]}, { "index":"\1", /g' | 
sed -e 's/^--- \(.*\)/"from":"\1", /g' | 
sed -e 's/^+++ \(.*\)/"to":"\1", "chunks":[/g' | 
sed -e 's/^@@ \(.*\) @@/]},{"locn":"\1", "lines": [/g' | 
sed -e 's/^-\(.*\)/"-\1",/g' | 
sed -e 's/^+\(.*\)/"+\1",/g' | 
sed -e 's/^ \(.*\)/" \1",/g' | 
sed -e 's/^==============.*//g' | 
tr -d '\n' | 
sed -e 's/"chunks":\[\]},{/"chunks":\[{/g' | 
sed -e 's/^]}, \(.*\)/{"changes":[ \1]}]}]}/g' | 
sed -e 's/,\]}/]}/g' |
jshon

It reliably turns ...

Index: file1.txt
===================================================================
--- file1.txt   (revision 8)
+++ file1.txt   (revision 9)
@@ -1,3 +1,5 @@
+zzz
+
 aaa

 Efficiently Blah blah
@@ -7,3 +9,5 @@
 functional solutions.

 bbb
+
+www

Into ...

{
 "changes": [
  {
   "index": "file1.txt",
   "to": "file1.txt   (revision 9)",
   "from": "file1.txt   (revision 8)",
   "chunks": [
    {
     "locn": "-1,3 +1,5",
     "lines": [
      "+zzz",
      "+",
      " aaa",
      " ",
      " Efficiently blah blah"
     ]
    },
    {
     "locn": "-7,3 +9,5",
     "lines": [
      " functional solutions.",
      " ",
      " bbb",
      "+",
      "+www"
     ]
    }
   ]
  }
 ]
}

But there's way more that could come out of 'svn diff' than I'm handling, and I wonder if it's foolish to carry on in this direction.

回答1:

I'd probably use the diff parser in libsvn_diff. I'm not sure if it's been wrapped by the bindings but it's likely that it works from the Python bindings.

Start with svn_diff_open_patch_file() and then iterate over the patches in the file by calling svn_diff_parse_next_patch() until it gives you NULL for the svn_patch_t.

Once you have the struct for each file it should be trivial to generate your JSON.

Fair warning, there may be bugs in that diff parser. It's was written for svn patch, which I find buggy (though I think most of the bugs are in the patch application not the parsing). On the other hand doing it this way should mean even if we adjust patch format output you should always have a good parser. And of course your bug reports (if you end up having any) could improve our parser.

Only other things that occur to me is that the API is not streamy (it works on files) which may not be what you want. Also if you really want to go down the rabbit hole you could just drive the WC/RA layer directly an act as a receiver of the editor drive that generates your json output instead of a unified diff. But that's probably way more than what you want because there's a ton of code just to handle all the different variations of diff target types (local to local, repo to repo, local to repo, repo to local).

EXAMPLE

So I decided to play with the diff parser. I ended up writing the following python script to use it and produce almost the same JSON output as your example. Note that the parser throws away the Index line so I don't have that in my output.

I ran into one small change I had to make to the Python SWIG bindings making this work (the hunks field of svn_patch_t wasn't properly being converted to a python list), which I fixed in r1548379 on Subversion trunk (I suspect that patch will apply cleanly to 1.8).

Note that svn_diff_hunk_readline_diff_text()'s documentation says the first line will be the hunk header, but it doesn't seem to be true. Though you can reconstruct the hunk header data you wanted with the svn_diff_hunk_get_{original,modified}_{start,length} functions.

I didn't bother to mess with the property change parsing or the operation parsing (I don't think the support for this is really complete but if you want it I leave it as an excercise to you).

My appologies if this isn't the most Pythonic code. Part of that is driven by the fact that the C APIs that are wrapped aren't conducive to that and part is that I'm simply not a super comfortable with Python. I did it in Python since those bindings are closer to being complete in this respect.

You can run the following script with just: python scriptname.py patchfile

import sys
from svn import diff, core
import json

class UDiff:
  def convert_svn_patch_t(self, patch, pool):
    data = {}
    data['from'] = patch.old_filename
    data['to'] = patch.new_filename
    iter_pool = core.Pool(pool);
    chunks = []
    for hunk in patch.hunks:
      iter_pool.clear()
      chunk = {}
      orig_start = diff.svn_diff_hunk_get_original_start(hunk)
      orig_len = diff.svn_diff_hunk_get_original_length(hunk)
      mod_start = diff.svn_diff_hunk_get_modified_start(hunk)
      mod_len = diff.svn_diff_hunk_get_modified_length(hunk)
      chunk['locn'] = "-%d,%d +%d,%d" % \
                      (orig_start, orig_len, mod_start, mod_len)
      lines = []
      while True:
        text, eol, eof = diff.svn_diff_hunk_readline_diff_text(hunk,
                                                               iter_pool,
                                                               iter_pool)
        if eof:
          break;
        lines.append("%s%s" % (text, eol))
      chunk['lines'] = lines
      chunks.append(chunk)
    data['chunks'] = chunks
    self.data = data

  def as_dict(self):
    return self.data

  def __init__(self, patch, pool):
    self.convert_svn_patch_t(patch, pool)

class UDiffAsJson:
  def __init__(self):
    self.pool = core.Pool()

  def convert(self, fname):
    patch_file = diff.svn_diff_open_patch_file(fname, self.pool)
    iter_pool = core.Pool(self.pool)
    changes = []
    while True:
      iter_pool.clear()
      patch = diff.svn_diff_parse_next_patch(patch_file,
                                             False, # reverse
                                             False, # ignore_whitespace
                                             iter_pool, iter_pool)
      if not patch:
        break
      udiff = UDiff(patch, iter_pool)
      changes.append(udiff.as_dict())
    data = {}
    data['changes'] = changes
    diff.svn_diff_close_patch_file(patch_file, iter_pool)
    return json.dumps(data, indent=True)

if __name__ == "__main__":
  udiffasjson = UDiffAsJson()
  sys.stdout.write(udiffasjson.convert(sys.argv[1]))

来源：https://stackoverflow.com/questions/20381507/whats-the-best-way-to-turn-a-subversion-diff-into-json

标签

json

svn

sed