My 'Problem' with the NixOS community
In the last few months, I was invited to join the nixos organization on github multiple times. I always rejected. Here's why.
Please notice that I really try to write this down as constructive criticism. If any of this offends you in any way, please let me know and I'll rephrase the specific part of this article. I really do care about the nixos community, I've been a user of NixOS (on all my devices except phone) since mid 2014, I've been a contributor since January 2015 and I am continuing to be an user and an author of contributions.
I do think that Nix or even NixOS is the one true way how to deploy systems that need to be reproducible, even if that needs one to sacrifice certain comfort.
Secondly, I need to provide some context from where I'm coming so the dear reader can understand my point of view in this article.
First of all, I did not start my journey with NixOS, of course. I was a late bloomer in regards to linux, in fact. I was introduced to Ubuntu by a friend of mine in 11th grade. I started to use Kubuntu, but only a few weeks later my friend noticed that I was getting better and better with the terminal, so maybe not even half a year later I switched to Archlinux, which I used on my desktops until I was introduced to NixOS. In that time, I learned how to write Java (which I do not do anymore btw), Ruby and C, started hacking a lot of funny things and managed to contribute patches to the linux kernel about two years later.
I'm not trying to show balls here! That last bit is important for this article, especially if you know how the kernel community works and how the development process of the kernel works. I guess you know where this is going.
I heard of NixOS in late 2014 at a conference in the black forest, where Joachim Schiele talked about it. A few months later, my latex setup broke from an update and I was frustrated enough by Archlinux to try something new.
I never looked back.
The “early days”
When I started using NixOS, Nix, the package manager, already existed for about ten years. Still, the community was small. When I went on the IRC channel or on the mailinglist, I could easily remember the nicknames and I was able to skim through the subjects of the mails on the list to see what was going on, eventhough I did not understand all of it.
That soon changed. I remember the 15.09 release when everyone was super excited and we were all “yeah, now we're beginning to fly” and so on. Fun times!
Problem 1: Commit access and development process
Now, lets get into the problems I have with the community and why I reject the invitations to join the github organization.
In fact, I started people asking and telling about this pretty early on: five(!) years ago, I started replying to an email thread with this message
Generally, I think it would be best to prevent commit access as far as possible. Having more people to be able to commit to master results in many different opinions committing to master, which therefor results in discussions, eventually flamewars and everything.
Keeping commit access for only a few people does not mean that things get slower, no way!
What you maybe want, at least from my point of view, is staging branches. Some kind of a hierarchy of maintainers, as you have in the linux kernel. I fully understand that the linux kernel is a way more complex system as nixos/nixpkgs, no discussion here. But if you'd split up responsibilities, you may end up with
* A fast and secure development model, as people don't revert back and forth.
* Fewer “wars” because people disagree on things
* Less maintaining efforts, because the effort is basically split up in several small “problems”, which are faster to solve.
What I want to say is, basically, you want a well-defined and structured way of how to do things.
Also please note that there's another mail from Michael Raskin in that thread where we talked about 25 PRs for new packages. Right now we're at about 1.8k open pull requests, with over 580 of them for new packages.
I take that as proof that we did not manage to sharpen and improve the process.
Lets get to the point. I started telling people that the development process we had back then was not optimal. In fact, I saw it coming: The community started to grow at an great pace back then and soon I talked to people on IRC and Mailinglist where I was like “Who the hell is this, I've never seen this name before and they seem not to be new, because they already know how things work and teach me...“.
The community grew and grew, over 4500 stars on github (if that measures anything), over 4500 forks on github.
When we reached 1k open pull requests, some people started noticing that we might not be able to scale anymore at some point. “How could we possibly manage that amount of pull requests ever?“.
Now we're at about 1.8k open pull requests.
I proposed changes several times, including moving away from github, which does IMO not scale to that amount of issues and PRs, especially because of its centralized structure and because of its linear discussions.
I proposed switching to kernel-style mailinglist. I was rejected with “We do not have enough people for that kind of development model”. I suspect that people did not understand what I meant by “kernel-style” back then (nor do I think they understand now). But I'm sure, now more than ever, that a switch to a mailinglist-based development model, with enough automation in place for CI and static analysis of patches would have had the best possible impact for the community. Even if that'd mean that newcomers would be a bit thrown-off at first!
The current state of affairs is even worse. Right now (as of this commit) , we have
- 1541 merges on master since 2020-01-01
- 1601 patches pushed directly to master since 2020-01-01
Feel free to reproduce these numbers with
$ git log --oneline --first-parent --since 2020-01-01 --[no-]merges | wc -l
That means that we had 1601 possibly breaking patches pushed by someone who things they are good enough and that their code never breaks. I'll leave it to the dear reader to google why pushing to master is a bad idea in a more-than-one-person-project.
Another thing that sticks out to me is this:
$ git log --first-parent --since 2020-01-01 --merges | \ grep "^Author" | \ sort -u | \ wc -l 74
74! 74 people have access to the master branch and can break it. I do not allege incompetence to any of these people, but we all know that not always everything works as smoothly as we expected, especially in software development. People are tired sometimes, people do make mistakes, people do miss things when reviewing things. That's why we invented continuous integration in the first place! That some thing can check whether the human part of the process did the right thing and report back if they didn't.
My dream-scenario would be that nobody would have access to master except for a bot like bors (or something equivalent for the Nix communiy). The rust communit, which uses bors heavily does software develoment the right way. If all checks pass, merging is done automatically. If not, the bot finds the breaking change by using a clever bisecting algorithm and merges all other (non-breaking) changes.
In fact, I would go further and introduce teams. Each team would be responsible
for one task in the community. For example there's different packaging
ecosystems within the nixpkgs repository, one for every language.
Each language could get a team of 3 to 5 members that coordinate the patches
that come in (from normal contributors) and apply them to a
branch. That branch would be merged on a regular basis (like... every week) to
master, if all tests/builds succeed (just like the kernel community does it)!
A team could also be introduced for some subsets of packages... Qt packages, server software, but also nixpkgs-libs or even documentation (which is another subject on itself).
Problem 2: “Kill the Wiki”
In 2015, at the nixcon in Berlin, we had this moment with “Kill the Wiki”. As far as I remember it was Rok who said that (not sure though). I was not a fan back then, and I'm actually even less a fan of that decision now.
Killing the wiki was the worst thing we could do documentation-wise. Everytime I tell people about nixos, I have to tell them that there is no decent documentation around. There is, of course, the documentation that is generated from the repository. That one is okay for the initial setup, but it is more than far away from being a good resource if you just want to look up how some things are done.
The nixos.wiki efforts fill the gap here a bit, sure. But we could really do better.
The solution would be rather simple: Bring back a wiki software, even if we start from scratch here or “just” merge the efforts from nixos.wiki – or make that one the official one – it would be an improvement all the way!
Problem 3: “Kill the mailinglist”
Certainly, what does this community have with killing their own infrastructure? They killed the wiki, they killed the mailinglist... both things that are really valuable... but github is the one thing that actually slows us down ... and does not get killed... I am stunned, really.
The solution here is also really simple: Bring it back. And not googlegroups or some other shitty provider, just host a mailman and create a few mailinglists... like the kernel.
I hope I do not have to write down the benefits here because the reader should be aware of them already. But for short:
- Threaded discussions (I can reply multiple times to one message, quote parts and reply to each part individually, creating a tree-style discussion where each branch focuses on one point)
- Asyncronous discussions (I can reply to a message in the middle of a thread rather than appending)
- Possibility to work offline (yeah, even in our age this is important)
- User can choose their interface (I like to use mutt, even on my mobile if possible. Web UIs suck)
I am aware that the “replacement” (which it really isn't) discourd is capable of going into mailinglist-mode. Ask me how great that is compared to a real mailinglist!
It is not.
The silver lining...
This article is a rather negative one, I know that. I do not like to close words with that negative feeling.
In fact, we got the RFC process, which we did not have when I started using nixos. We have the Borg bot, which helps a bit and is a great effort. So, we're in the process of improving things.
I'm still positive that, at some point, we improve the rate of improvements as well and get to a point where we can scale up to the numbers of contributors we currently have, or even more.
Because right now, we can't.
I did make some mistakes here and I want to thank everyone for telling me.
Some nice folks on the nixos IRC/matrix channel suggested that my numbers for PRs vs. pushes to master were wrong, as githubs squash-and-merge feature is enabled on the github repository for nixpkgs.
It seems that about 4700 PRs were merged since 2020-01-01. This does proof my numbers wrong. Fact is: on my master branch of the nixpkgs github repository, there are 3142 commits. It seems that not all pull-requests were to master, which is of course true because PRs can and are filed against the staging branch of nixpkgs and also the stable branches.
Github does not offer a way to query PRs that are filed against a certain branch (at least not in the web UI), as far as I see.
So let's do some more fine-granular analysis on the commits I see on master:
git log --oneline --first-parent --since 2020-01-01 | \ grep -v "Merge pull request" | \ wc -l 1650
As github does create a commit message for the merge, we can grep that away and see what the result is. I am assuming here that nobody ever changes the default merge commit message, which might not be entirely true. I assume, though, that it happens not that often.
So we have 3142 commits from which are 1650 not github-branch-merges.
From time to time, master gets merged into staging and the other way round:
- 20 merges from master to staging
- 5 merges from staging to master
That leaves us at 1625 commits where the patch landed directly on master. How many of these patches were submitted via a pull request is not that easy to evaluate. One could write a crawler that finds the patches on github and checks whether they appear in a PR... but honestly, my point still holds true: If only one breaking patch lands on master per week, that results in enough slow-down and pain for the development process.
The inconsistency in the process is the real problem, having a mechanism that handles and schedules CI jobs and merges and a clear merge-window for per-topic changesets from team-maintained branches would give the community some structure. New contributors could be guided more easily as they would have a counterpart to contact for topic-specific questions and negotiations wouldn't be between people anymore but between teams, which would also give the whole community some structure and would also clearify responsibilities.