How are version control histories stored and calculated?

后端 未结 3 1468
孤城傲影
孤城傲影 2021-02-05 17:41

Consider this simple python code, which demonstrates a very simple version control design for a dictonary:

def build_current(history):
    current = {}
    for a         


        
3条回答
  •  不要未来只要你来
    2021-02-05 18:18

    As a more generic answer, you need to differentiate CVCS (Centralized VCS, like CVS, SVN, Perforce, ClearCase, ...) from DVCS (Distributed VCS, like Git or Mercurial).
    They involves different workflows and usage.

    In particular, the exchange of data between a CVCS client and its server will be more important than with a DVCS (which really needs delta when pushing or pulling the all repo)

    That is why delta are very important for most operations in a CVCS, a only important for certain operations and for different reasons in a DVCS.

    Deltas are described in Eric Sink two books:

    • Source Control HOWTO, chapter Chapter 4: Repositories:

    Repository = File System * Time

    A tree is a hierarchy of folders and files. A delta is the difference between two trees. In theory, those two trees do not need to be related. However, in practice, the only reason we calculate the difference between them is because one of them is derived from the other. Some developer started with tree N and made one or more changes, resulting in tree N+1.

    We can think of the delta as a set of changes. In fact, many SCM tools use the term "changeset" for exactly this purpose. A changeset is merely a list of the changes which express the difference between two trees.

    The delta sense is important (see this thread): forward delta or reverse delta.

    Some SCM tools use some sort of a compromise design. In one approach, instead of storing just one full tree and representing every other tree as a delta, we sprinkle a few more full trees along the way.

    You can see the evolution of for the "old" VCS in Eric Raymond's Understanding Version-Control Systems.

    • Version Control by Example, Chapter 12. DVCS Internals:

    Many modern version control tools use binary file deltas for repository storage.
    One popular file delta algorithm is called vcdiff.
    It outputs a list of byte ranges which have been changed. This means it can handle any kind of file, binary or text. As an ancillary benefit, the vcdiff algorithm compresses the data at the same time.

    Don't forget that delta management also has an influence on the Directed Acyclic Graphs (DAGs) created for representing the history (see "Arrows direction in ProGit book" and the inconvenient behind DAG).

    you can find specifics about delta management for:

    • Git: "Is the git binary diff algorithm (delta storage) standardized?"
    • Mercurial: "Mercurial: Repository Structure"
    • Veracity (which combines both approach): "Veracity: DAGs and Data"

    Veracity supports two kinds of DAGs:

    • A tree DAG keeps the version history of a directory structure from a filesystem. Each node of the DAG represents one version of the whole tree.

    • A database (or “db”) DAG keeps the version history of a database, or a list of records. Each node of the DAG represents one state of the complete database.

    That last points illustrates that the third (fourth?) generation of VCS must deal with distribution of not only the files (tree) but also databases (for various purposes)

提交回复
热议问题