Comparison between Centralized and Distributed Version Control Systems [closed]

╄→гoц情女王★ 提交于 2019-11-27 02:41:50
Craig Trader

From my answer to a different question:

Distributed version control systems (DVCSs) solve different problems than Centralized VCSs. Comparing them is like comparing hammers and screwdrivers.

Centralized VCS systems are designed with the intent that there is One True Source that is Blessed, and therefore Good. All developers work (checkout) from that source, and then add (commit) their changes, which then become similarly Blessed. The only real difference between CVS, Subversion, ClearCase, Perforce, VisualSourceSafe and all the other CVCSes is in the workflow, performance, and integration that each product offers.

Distributed VCS systems are designed with the intent that one repository is as good as any other, and that merges from one repository to another are just another form of communication. Any semantic value as to which repository should be trusted is imposed from the outside by process, not by the software itself.

The real choice between using one type or the other is organizational -- if your project or organization wants centralized control, then a DVCS is a non-starter. If your developers are expected to work all over the country/world, without secure broadband connections to a central repository, then DVCS is probably your salvation. If you need both, you're fsck'd.

manicmethod

To those who think distributed systems don't allow authoritative copies please note that there are plenty of places where distributed systems have authoritative copies, the perfect example is probably Linus' kernel tree. Sure lots of people have their own trees but almost all of them flow toward Linus' tree.

That said I use to think that distributed SCM's were only useful for lots of developers doing different things but recently have decided that anything a centralized repository can do a distributed one can do better.

For example, say you are a solo developer working on your own personal project. A centralized repository might be an obvious choice but consider this scenario. You are away from network access (on a plane, at a park, etc) and want to work on your project. You have your local copy so you can do work fine but you really want to commit because you have finished one feature and want to move on to another, or you found a bug to fix or whatever. The point is that with a centralized repo you end up either mashing all the changes together and commiting them in a non-logical changeset or you manually split them out later.

With a distributed repo you go on business as usual, commit, move on, when you have net access again you push to your "one true repo" and nothing changed.

Not to mention the other nice thing about distributed repos: full history available always. You need to look at the revision logs when away from the net? You need to annotate the source to see how a bug was introduced? All possible with distributed repos.

Please please don't believe that distributed vs centralized is about ownership or authoritative copies or anything like that. The reality is distributed is the next step in evolution of SCM's.

sastanin

Not really a comparison, but here are what big projects are using:

Centralized VCSes

  • Subversion

    Apache, GCC, Ruby, MPlayer, Zope, Plone, Xiph, FreeBSD, WebKit, ...

  • CVS

    CVS

Distributed VCSes

  • git

    Linux kernel, KDE, Perl, Ruby on Rails, Android, Wine, Fedora, X.org, Mediawiki, Django, VLC, Mono, Gnome, Samba, CUPS, GnuPG, Emacs ELPA...

  • mercurial (hg)

    Mozilla and Mozdev, OpenJDK (Java), OpenSolaris, ALSA, NTFS-3G, Dovecot, MoinMoin, mutt, PETSc, Octave, FEniCS, Aptitude, Python, XEmacs, Xen, Vim, Xine...

  • bzr

    Emacs, Apt, Mailman, MySQL, Squid, ... also promoted within Ubuntu.

  • darcs

    ghc, ion, xmonad, ... popular within Haskell community.

  • fossil

    SQLite

Spoike

W. Craig Trader said this about DVCS and CVCS:

If you need both, you're fsck'd.

I wouldn't say you're fsck'd when using both. Practically developers who use DVCS tools usually try to merge their changes (or send pull requests) against a central location (usually to a release branch in a release repository). There is some irony with developers who use DVCS but in the end stick with a centralized workflow, you can start to wonder if the Distributed approach really is better than Centralized.

There are some advantages with DVCS over a CVCS:

  • The notion of uniquely recognizable commits makes sending patches between peers painless. I.e. you make the patch as a commit, and share it with others developers who need it. Later when everyone wants to merge together, that particular commit is recognized and can be compared between branches, having less chance of merge conflict. Developers tend to send patches to each other by USB stick or e-mail regardless of versioning tool you use. Unfortunately in the CVCS case, version control will register the commits as seperate, failing to recognize that the changes are the same, leading to a higher chance of merge conflict.

  • You can have local experimental branches (cloned repositories can also be considered a branch) that you don't need to show to others. That means, breaking changes don't need to affect developers if you haven't pushed anything upstream. In a CVCS, when you still have a breaking change, you may have to work offline until you've fixed it and commit the changes by then. This approach effectively defeats the purpose of using versioning as a safety net but it is a necessary evil in CVCS.

  • In today's world, companies usually work with off-shore developers (or if even better they want to work from home). Having a DVCS helps these kind of projects out because it eliminates the need of a reliable network connection since everyone has their own repo.

…and some disadvantages that usually have workarounds:

  • Who has the latest revision? In a CVCS, the trunk usually has the latest revision, but in a DVCS it may not be plainly obvious. The workaround is using rules of conduct, that the developers in a project have to come to an agreement in which repo to merge their work against.

  • Pessimistic locks, i.e. a file is locked when making a check-out, are usually not possible because of concurrency that may happen between repositories in DVCS. The reason file locking exists in version control is because developers want to avoid merge conflicts. However, locking has the disadvantage of slowing development down as two developers can't work on same piece of code simultaneously as with a long transaction model and it isn't full proof warranty against merge conflicts. The only sane ways regardless of version control is to combat big merge conflicts is to have good code architecture (like low coupling high cohesion) and divide up your work tasks so that they have low impact on the code (which is easier said than done).

  • In proprietary projects it would be disastrous if the whole repository becomes publically available. Even more so if a disgruntled or malicious programmer gets hold of a cloned repository. Source code leakage is a severe pain for proprietary businesses. DVCS's makes this plain simple as you only need to clone the repository, while some CM systems (such as ClearCase) tries to restrict that access. However in my opinion, if you have an enough amount of dysfunctionality in your company culture then no version control in the world will help you against source code leakage.

During my search for the right SCM, I found the following links to be of great help:

  1. Better SCM Initiative : Comparison. Comparison of about 26 version control systems.
  2. Comparison of revision control software. Wikipedia article comparing about 38 version control systems covering topics like technical differences, features, user interfaces, and more.
  3. Distributed version control systems. Another comparison, but focussed mainly on distributed systems.

To some extent, the two schemes are equivalent:

  • A distributed VCS can trivially emulate a centralised one if you just always push your changes to some designated upstream repository after every local commit.
  • A centralised VCS won't usually be able to emulate a distributed one quite as naturally, but you can get something very similar if you use something like quilt on top of it. Quilt, if you're not familiar with it, is a tool for managing large sets of patches on top of some upstream project. The idea here is that the DVCS commit command is implemented by creating a new patch, and the push command is implemented by committing every outstanding patch to the centralised VCS and then discarding the patch files. This sounds a bit awkward, but in practice it actually works rather nicely.

Having said that, there are a couple of things which DVCSes traditionally do very well and which most centralised VCSes make a bit of a hash of. The most important of these is probably branching: a DVCS will make it very easy to branch the repository or to merge branches which are no longer needed, and will keep track of history while you do so. There's no particular reason why a centralised scheme would have trouble with this, but historically nobody seems to have quite gotten it right yet. Whether that's actually a problem for you depends on how you're going to organise development, but for many people it's a significant consideration.

The other posited advantage of DVCSes is that they work offline. I've never really had much use for that; I mostly do development either at the office (so the repository's on the local network) or at home (so there's ADSL). If you do a lot of development on laptops while traveling then this might be more of a consideration for you.

There aren't actually very many gotchas which are specific to DVCSes. There's a slightly greater tendency for people to go quiet, because you can commit without pushing and it's easy to end up polishing things in private, but apart from that we haven't had very many problems. This may be because we have a significant number of open source developers, who are usually familiar with the patch-trading model of development, but incoming closed source developers also seem to pick things up reasonably quickly.

Distributed VCS are appealing in many ways, but one disadvantage that will be important to my company is the issue of managing non-mergable files (typically binary, e.g. Excel documents). Subversion deals with this by supporting the "svn:needs-lock" property, which means you must get a lock for the non-mergable file before you edit it. It works well. But that work-flow requires a centralised repository model, which is contrary to the DVCS concept.

So if you want to use a DVCS, it is not really appropriate for managing files that are non-mergable.

Trausti Thor

I have been using subversion for many years now and I was really happy with it.

Then the GIT buzz started and I just had to test it. And for me, the main selling point was branching. Oh boy. Now I no longer need to clean my repository, go back a few version or any of the silly things I did when using subversion. Everything is cheap in dvcs. I have only tried fossil and git though, but I have used perforce, cvs and subversion and it looks like dvcs all have really cheap branching and tagging. No longer need to copy all code to one side and therefore merging is just a breeze.

Any dvcs can be setup with a central server, but what you get is among other things

You can checkin any small change you like, as Linus says if you need to use more than one sentence to describe what you just did, you are doing too much. You can have your way with the code, branch, merge, clone and test all locally without causing anyone to download huge amount of data. And you only need to push the final changes into the central server.

And you can work with no network.

So in short, using a version control is always a good thing. Using dvcs is cheaper (in KB and bandwidth), and I think it is more fun to use.

To checkout Git : http://git-scm.com/
To checkout Fossil : http://www.fossil-scm.org
To checkout Mercurial : https://www.mercurial-scm.org

Now, I can only recommend dvcs systems, and you easily can use a central server

The main problem (aside from the obvious bandwidth issue) is ownership.

That is to be sure to different (geographic) site are not working on the same element than the other.

Ideally, the tool is able to assign ownership to a file, a branch or even a repository.

To answer the comments of this answer, you really want the tool to tell you who owns what, and then communicate (through phone, IM or mail) with the distant site.
If you have not ownership mechanism... you will "communicate", but often too late ;) (i.e.: after having done concurrent development on an identical set of files in the same branch. The commit can get messy)

unexist

For me this is another discussion about a personal taste and it's rather difficult to be really objective. I personally prefer Mercurial over the other DVCS. I like to write hooks in the same language as Mercurial is written in and the smaller network overhead - just to say some of my own reasons.

Everybody these days is on the bandwagon about how DVCSs are superior, but Craig's comment is important. In a DVCS, each person has the entire history of the branch. If you are working with a lot of binary files, (for example, image files or FLAs) this requires a huge amount of space and you can't do diffs.

I have a feeling that Mercurial (and other DVCS) are more sophisticated than the centralised ones. For instance, merging a branch in Mercurial keeps the complete history of the branch whereas in SVN you have to go to the branch directory to see the history.

Spoike

Another plus for distributed SCM even in solo developer scenario is if you, like many of us out there, have more than one machine you work on.

Lets say you have a set of common scripts. If each machine you work on has a clone you can on demand update and change your scripts. It gives you:

  1. a time saver, especially with ssh keys
  2. a way to branch differences between different systems (e.g. Red Hat vs Debian, BSD vs Linux, etc)

W. Craig Trader's answer sums up most of it, however, I find that personal work style makes a huge difference as well. Where I currently work we use subversion as our One True Source, however, many developers use git-svn on their personal machines to compensate for workflow issue we have (failure of management, but that's another story). In any case. its really about balancing what feature sets make you most productive with what the organization needs (centralized authentication, for example).

A centralised system doesn't necessarily prevent you from using separate branches to do development on. There doesn't need to be a single true copy of the code base, rather different developers or teams can have different branches, legacy branches could exist etc.

What it does generally mean is that the repository is centrally managed - but that's generally an advantage in a company with a competent IT department because it means there's only one place to backup and only one place to manage storage in.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!