Saturday, September 27, 2008

Git in the office

I've been using git recently. Not just me, in fact, but lots of open-source projects have started using it. Right now I use it on github, to keep track of my emacs customizations.

Git is a very interesting source control system to use, mainly because it makes it extremely easy to do branches and patching. Linus has a very interesting video on the rationale behind creating git, but to sum up his point in a few sentences: distributed source control, by allowing local modifications and local patches that can be easiliy merged, avoids the problems of latency and namespace pollution you get using centralized source control systems. Also, he says git is the best distributed one out there, because it is very fast. I really can't comment on whether it is the best, I haven't used darcs or any of the other distributed source control systems that have become popular.

Git can be useful for the corporate developer. When you have the ability to do local commits and local branches, development becomes more efficient. You can commit changes at notable points, so if you've spent the last few hours fruitlessly exploring an interesting idea, you can revert back to your last good commit and proceed without having to manually unwind your changes. Or, if you need to make a quick bugfix, you can just branch off of the current head branch, make a fix, then check it in and return to your previous branch. The ability to always stay in one directory tree can be an efficiency gain as well. And you can work offline, doing commits and everything you can do with a centralized source control system. When you get online and you are ready, you just push all those commits to some other git repository.

So, how can software companies use git? Companies have traditionally used centralized source control systems. I've used CVS (ugh), Perforce (not bad), StarTeam (somewhat yucky, but tolerable), and Visual SourceSafe (flaky) at various companies. It's easy to comprehend a centralized system like this; you have your code, it's in one place, and that's all. Git can be like that too, in a way. There could be one central git repository, and each developer would have their own local repository. Developers could do their own commits on their machine, and when they have something worth checking into the main repository, they do a "git push" to push it there. Or, instead someone in charge of the next release specifically pulls relevant changelists from developers machines to create a coherent release. This is basically what open-source projects do. This would work for small companies, who don't need anything more complicated than that.

If small companies could use git like a centralized repository system, large companies can use it in more interesting ways.

Large companies often have distributed development offices where the pain of using a centralized source repository system is fairly large. Also, large companies also have lots of source which they tend to put in one repository. After a certain point, these repositories get slow. Repository slowness plus network lag for remote offices create a real issue for distributed teams. Git can help by allowing each team to have it's own local repository which they use. Changes are aggregated from each developer's git repository to a local team repository. When it's time to do a release, or at regular intervals, each team's repository is merged with the release repository. Similarly, each team has to make sure their repository is in sync with the release repository, because that one has all relevant changes.

There is a disadvantage to this: with distributed repositories, merges don't happen on every checkin. Merges get put off, and when they happen they the conflicts are typically worse. In such situations, it should be the responsibility of the individual teams to fix any merge problems, that happen when either pushing or pulling their changes.

There would be some great advantages. If you run a continuous build, you only see your own build breakages, not others. The release branch can merge only repositories that are passing their tests, and therefore they always get changes that are good. Of course, it should really run its own continuous build. So you need more build machines, but you get more developer productivity, since any developer's local build is not affected by other developer's build breakages.

From what I've been told, Microsoft has a similar tiered source control system for Windows. For any sufficiently complicated project, such a system is in fact a necessity.

Could this work? I've never seen it happen, but git and distributed source control in general is relatively new. The problem is that the benefits are greatest when the company is large, but the decision on what system to use is made when the company is small, where the benefits are less obvious. It may be that some companies will adopt it for certain projects where it makes sense, but not universally.

No comments: