How are version control histories stored and calculated?

后端 未结 3 1495
孤城傲影
孤城傲影 2021-02-05 17:41

Consider this simple python code, which demonstrates a very simple version control design for a dictonary:

def build_current(history):
    current = {}
    for a         


        
3条回答
  •  没有蜡笔的小新
    2021-02-05 18:36

    You mentioned these 3 methods of storing (file)-history:

    1. patch : a patch is the (usually textual, but binary patches are also possible) representation of the difference between two files. It is the output of unix command diff and can be applied by unix command patch. A lot of versioning systems are using patches to store the history of files (eg. SVN, CVS, GIT..). Sometimes these patches are technically called "delta" as the greek letter "Δ" describing the difference of two things.
    2. changeset: a changeset is a term to combine changes "which belonging together" to different files in a single entity. Not all versioning systems support changesets (most notable CVS and SourceSafe). Developer are using changesets to avoid broken builds(example: change the signature of a method in one file, change the call in a second file. You need to have both changes in place to run the program, otherwise you get an error). See also here for the difference between changeset and patch.
    3. snapshots: are full copies of the state of this file/filesystem to this point of time. They are usually quite large and their usage depends on performance characteristics. The snapshot is always redundant to a list of patches, however to retrieve information faster, sometimes Versioning Systems mix or combine patches and snapshots

    Subversion uses forward deltas(aka patches) in FSFS repositories and backward deltas in BDB Repositories. Note that these implementations have different performance characteristics:

    • forward deltas are fast in committing, but slow on checkouts(as the "current" version must be reconstructed)

    • backward deltas are fast in checking out but slow on commit as new deltas must be constructed to construct the new current and rewrite the previous "current" as a bunch of deltas

    Also note that FSFS uses a "skipping delta" algorithm which minimizes the jumps to reconstruct a specific version. This skipping delta however is not size optimized as mercurials snapshots; it just minimizes the number of "revisions" you need to build a full version regardless of the overall size.

    Here a small ascii art (copied from the specification) of a file with 9 revisions:

    0 <- 1    2 <- 3    4 <- 5    6 <- 7
    0 <------ 2         4 <------ 6
    0 <---------------- 4
    0 <------------------------------------ 8 <- 9
    

    where "0 <- 1" means that the delta base for revision 1 is revision 0.

    The number of jumps is at most log(N) for N revisions.

    Also a very pretty effect on FSFS is that older revision will be written only once and after this they will be only read by further actions. That's why subversion repositories are very stable: as long as you do not have a HW failure on your harddisk, you should be able to get a working repository even if some corruption occurred in the last commit: You still have all older revisions.

    In BDB Backend you have constantly rewrite the current revision on checkins/commits, which makes this process prone to data corruption. Also as you store the full text only in current revision, corrupting the data on commit will likely destroy great parts of your history.

提交回复
热议问题