This post was written during my trip through Iceland and published much
later than it was written.
Version Control is one important aspect when developing software as a
whole, and especially when developing open source software.
Here are some thoughts about it.
Technology
First of all, technology wise it doesn't matter which version control
system one uses. For the sake I'm using git here as an example VCS,
though others might do as well.
One important thing, at least in my opinion, is that the VCS has some
basic functionality. This is mainly that it can be used distributed and
has a branching functionality (which are two things I like to believe
go hand in hand).
So I do not care whether one uses git, mecurial, or anything else. Most
important is that a (D)VCS is actually used.
Branching model
Branching is a method that came up before git was created, as bitkeeper
had such functionality (as far as I can tell) before Linus Torvalds
wrote git. It is only that git has revolutionised the way we do version
control and brought branching to wider knowledge and use.
In my opinion it is really important how branching is done. There is not
simply “the branching” but there are many ways to do branching and one
might be better for certain use case than another. There are known
models such as feature branching, the gitflow branching model and a
rebase-merge workflow. I don't want to explain each of them because
others have done so way better than I ever could.
What I want to tell is that branching is not only important, but as
flexible as you might not even guess. This is not necessarily a good
thing – I'll show you in a minute. In my opinion, branching and
developing a branching model for a project is like developing an API.
Once it is set up properly, it may serve as a communication rule for a
project, putting developers on the same page about how certain things
have to be handled. Having a protocol on how to work on things is a good
thing. If implemented properly, branching can improve the work of
everyone as it is one point less to think about.
The bad thing about flexibility of branching functionality is that it
can be done wrong. It's as simply as that, but merging one branch into
another when one is not supposed to do that, creates overhead which
might not be reversible. This has happened to the best communities (for
example the kernel community) but also happens in small communities,
often due to too few knowledge of the tools at hand.
To summarize: If an open source project gets to a certain size (both
code-wise and contributor/community-wise) a branching model should be
implemented. If there are rules that contributors agree upon, it can
improve working speed and therefore overall happiness in the community.
Because developers like to bikeshed, it could also worsen happiness, of
course. Though, it is better than no plan and chaos instead.
Hosting
I will not go into thoughts about hosting platforms in this article but
rather on the how and why.
First of all, hosting the code somewhere with a way to show it in a web
browser is a good way to improve the “open” part of open source code. Of
course, tarball downloads and such suffice, but we are in the 21st
century, so having a nice web interface is something one can expect.
Making the code browsable is often done via a VCS-specific web frontend,
for example cgit for code version controlled with git. Therefore this
web interfaces often also feature functionality to go back in time
and view the history of one file. Maybe this is not needed
often, but nevertheless helpful if needed.
I personally do not care about comments on code in my web interfaces
or even ways to register users on the site, but of course some people like
that. There are web interfaces that feature such things, for example for
the git VCS there is gitea, gogs, gitlab, ... and many more. And of
course there are the closed providers github, bitbucket and others...
Making code public and contributions easy
Hosting helps a lot with enabling contributions from strangers. No
doubt, github makes contributions ridiculously easy.
I don't want to reiterate what others have said better and most people
already know. What I want to point out here is that open source does not
mean “open contributions”. One is completely free to reject all
contributions one ones code base.
I really want to stress this. Open source does indeed mean that everyone
is able to view the code, which also enables them to copy it (though
redistribution might be limited or forbidden, as only free software
allows you – by definition – to redistribute and alter code) but not
necessarily that one is allowed or welcome to send in changes, feature
requests or the like.
So if you want people to contribute to your code and suggest changes,
features or report bugs, you should somehow give them the opportunity to
do so. Depending on how “open” you want to be with your development you
either should use a hosting platform (like github or bitbucket) or a
slightly more “closed” variant, for example hosting your code on
your own gitea instance. One step further you'd host your code on a
site where people might be able to get it, maybe even with a “git
clone”, though not send in pull requests, feature requests or open
issues (for example a hosted git repository with cgit interface).
Issues and bug reports could still be done via a mailinglist, if
desired.
In fact, that last bit is what I consider for my own project
imag.
SemVer, Change Management, Release Management
As soon as your code is out there, you have to think about change and
release management. In my opinion, these are topics closely related to
source code version control as VCS often offer functionality to do
releases in one form or another and are clearly involved in the process
of change management.
First of all, I'd like to suggest you read the SemVer specification. It
is not that long but will help you understanding the next few
paragraphs. So if you haven't read it already, go ahead and do so. Even
if you don't apply SemVer to your projects it might open your eyes in
one aspect or another.
But before we get into releases, we should first talk about change
management, or better named for my points: Pull request management.
What I personally do with my PRs is, merge them when they're ready. This
approach is easy and works, so far, pretty well. From time to time I
have changes in my working branches (as stated before, I use feature
branches) which might conflict with other peoples work. For the sake of
contributor experience, I pause my PRs and wait until they are done with
theirs. We will talk a lot about this in the next episode of this
series, so I won't go into much detail. For now: This is a simple
approach that works perfectly well so far for me and my (considerably
small) open source projects.
But as soon as ones project grews bigger, that approach might not do the
job anymore. If there are too many changes in a short amount of time
which have to be agreed on and that have to be merged, it might be time
to think about an alternative approach.
There are two ways I would tackle this problem. I never experienced it in
the “real world”/in my projects, so the following is just a write down
of my thoughts. Take a grain of salt from here on.
The first approach I can think of is to assign certain subsystems to
certain people. If the amount of changes has become too big, one could
assume that the codebase has also become tremendous. If that is the
case, sub-maintainers can handle certain subsystems and the project
leader can then periodically merge all changes together. This requires,
of course, at least two people that are interested into the subject and
willing to contribute maintaining efforts to the project.
If the latter is not the case or there are too few people around for
this, one could consider a merge-window style approach, like known from
Linus himself. Changes are pulled in every other week, for example, and
the rest of the time, only bug fixes are merged into the project.
These two approaches might become handy some day if one is about to
maintain a large code base alone (as in “as the only project owner”).
Now on to release management. In my opinion, releases should be done as
soon as something works and from there on periodically. I myself made
one mistake too often: Pull more things into one release than would have
been good. For example the imag 0.2.0 release was over one year ago.
0.3.0 is almost ready, but not yet. I should've done more releases in
between.
In my opinion, more releases with clear-cut edges are better than long
release-cycles. As soon as there is a new feature for users – release.
User-facing fixed – release. This might result in high numbers for
versioning, but who cares?
This is where I want to throw SemVer in, to adjust my statement from the
last paragraph with a “but”.
SemVer can be used to notify breaking user interfaces. This is a really
good thing and therefore I think SemVer should be applied everywhere.
SemVer also states that in the “ 0.y.z phase” everything is allowed to
happen, also API breakage. This is where I want to adjust my statement
from above. A lot of releases should be done in the 0.y.z phase, but
also within that scope. As soon as a library or program hits the 1.0.0,
changes should be applied carefully. One really does not want to end up
with a program or library in version 127.0.0, right? That'd also
decrease a users trust into the application as one can expect breakage
with every new release.
So what I'd do and actually plan doing with my projects is releasing a
number of zero-releases until I am confident that everything is all
right and then go from there. For imag specifically I am not thinking
about 1.0.0 because imag is far from ready, but for my other projects,
especially toml-query, I think of 1.0.0 already.
Another point which popped into my head weeks after the initial draft of
this article was: Do not plan the features of the next release with a
release number! This might sound a bit odd, so let me explain. For
example, you're planning three major features for the next release,
which will be 0.15.0 then. And you're slowly getting to a point where
the release becomes ready, you might need three more weeks to get it
ready. Now, a contributor steps up and opens a pull request with
another feature, which is already completely implemented, tested and
also documented in the pull request. The contributor needs this feature
as soon as possible in your code and you also think that it might be a
great idea to release this as soon as possible. After you merged the
request, you release the source – as 0.15.0, despite your three features
are not yet completed.
Two things come to mind in this scenario: First, if two of your three
features are already completed, they might show up in 0.15.0 but one
feature has to be moved to the next release. If these two features are
ready, but not tested, you might end up with a buggy
release and have to release 0.15.1 soonish – more effort for you. If you
do not merge your features into the master branch of your project, but
you have a 0.15.0-prepare branch or something like that, you end up with
a rather ugly merge-mess later on, as 0.15.0 is already released and you
cannot just rename a public branch.
So how to handle this properly? I came to the conclusion that
release-branches is the way to go here. In the scenario described above,
you'd branch off of the release before, most certainly 0.14.x and create
a new branch 0.15.0, where the pull request of the contributor would be
merged than. As soon as the release is out, 0.15.0 will be tagged and
merged back to the master branch.
What my point is here: you'd still need to rename your next milestone or
rewrite your issues for the next release. That's why I would not plan
“0.15.0”, but simply “the next release” – because you'll never know
whether your planned things will actually be the next release or the
the release after. So lessen the effort for yourself here!
Next
In the next article in this series I want to elaborate on how to make a
contribution as pleasing as possible for the contributor. I guess I can
talk a lot about that because I've contributed to a lot of projects
already, including but not limited to linux,
nixpkgs and
nanoc.
tags: #open-source #programming #software #tools #rust