All we need for a self-hosting distributed open source programming toolchain

September 30, 2017

This post was written during my trip through Iceland and published much latern than it was written.

From my last article on whether to move away from github with imag , you saw that I'm heavily thinking about this subject. In this post I want to summarize what I think we need for a completely self-hosted open source programming toolchain.

Issue tracking

Do we actually need this? For example, does the kernel do these things? Bug tracking – yes. But Issue tracking as in planning what to implement next? Maybe the companies that contribute to the kernel, internally, but not the kernel community as a whole (AFAIK).

Of course the kernel is a bad example in this case, because of its size, the size of the community around it and all these things. And other smaller projects use issue tracking for planning, for example the nixos community (which is still fairly big) or the Archlinux community (though I'm not sure whether they are doing these things only over mailinglists or via a dedicated forum at archlinux.org.

Bug tracking

Bug tracking should be done on a per-bug basis. I think this is a very particular problem that can be easily solved with a mailing list. As soon as a bug is found, it is posted to the mailing list and discussion and patches are added to the list thread until the issue is solved.

Pull request tracking

With github, a contributor automatically has an web-accessible repository. But for the most part it is sufficient if the patches are send via an email-patch workflow, which is how many great projects work. Having web-accessible repositories available is just a convenience github introduced and now everybody expects.

I think pull requests (or rather patchsets) are tracked no matter how they are submitted. If you open a PR on github, patches are equally good tracked as with mailing lists. Indeed I even think that mailing lists are way better for tracking and discussion, as one can start a discussion on each individual patch. That's not really possible with github. Also, the tree-shape one can get into when discussing a patch is a major point where mailing lists are way better than github.

CI

Continuous Integration is a thing where solutions like gitlab or github shine. They easily integrate with repositories, are for free and result in better and tested code (normally). I do not know of any bigger open source project that does not use some form of CI. Even the kernel is tested (though not by the kernel community directly but rather companies like Intel or Redhat, as far as I know).

A CI solution, though, is rather simple to implement (but I'm sure it is not easy to get it right). Read my expectations below.

How I would like it

Issue and bug tracking should be based on plain text, which means that one should be able to integrate a mailing list into the bug tracking system. Fortunately, there is such an effort named git-dit but it is not usable yet. Well, it is useable, but has neither email integration nor a web interface (for viewing). This is, of course, unfortunate. Also, there is no way to import existing issues from (for example) github. And that's important, of course.

For pull request/patch management, there's patchworks. I've never worked with it, but as far as I can see it works nicely and could be used. But I would prefer to use git-dit for this, too.

I would love to have an CI tool that works on a git-push-based model. For example you install a post-receive hook in your repository on your server, and as soon as there is a new push, the hook verifies some things and then starts to build the project from a script, which preferably lives in the repository itself. One step further, the tool would create a RAM-disk, clone the just pushed branch into it (so we have a fresh clone) and builds things there. Even one step further, the tool would create a new container (think of systemd-nspawn) and trigger the build there. That would ensure that the build does not depend on some global system state.

This, of course, has also some security implications. That's why I would only build branches where the newest (the latest) commit is signed with a certain GPG key. It's an really easy thing to do it and because of GPG and git itself, one can be sure that only certain people can trigger a build (which is only execution of a shell script, so you see that this has some implications). Another idea would be to rely on gitolite, which has ssh authentication. This would be even easier, as no validation would be necessary on our side.

The results of the build should be mailed to the author/commiter of the build commit.

And yes, now that I wrote these things down I see that we have such an tool already: drone.

That's it!

That's actually it. We don't need more than that for a toolchain for developing open source software with self hosted solutions. Decentralized issue/PR tracking, a decent CI toolchain, git and here we go.

tags: #open-source #programming #software #tools #git #github