musicmatzes blog

This post was written during my trip through Iceland and published much latern than it was written.

When writing my last entry, I argued that we need decentralized issue tracking.

Here's why.

Why these things must be decentralized

Issue tracking, bug tracking and of course pull request tracking (which could all be named “issue tracking”, btw) must, in my opinion, be decentralized. That's not only because I think offline-first should be the way to go, even today in our always-connected and always-on(line) world. It's also because of redundancy and failure safety. Linus Torvalds himself once said:

I don't do backups, I publish and let the world mirror it

(paraphrased)

And that's true. We should not need to do backups, we should share all data, in my opinion. Even absolutely personal data. Yes, you read that right, me, the Facebook/Google/Twitter/Whatsapp/whatever free living guy tells you that all data need to be shared. But I also tell you: Never ever unencrypted! And I do not mean transport encryption, I mean real encryption. Unbreakable is not possible, but at least theoretically-unbreakable for at least 250 years should do. If the data is shared in a decentralized way, like IPFS or Filecoin try to do, we (almost) can be sure that if our hard drive failes, we don't lose the data. And of course, you can still do backups.

Now let's get back to topic. If we decentralize issue tracking, we can make sure that issues are around somewhere. If github closes down tomorrow, thousands, if not millions, open source projects lose their issues. And that's not only current issues, but also history of issue tracking, which means also data how a bug was closed, how a bug should be closed, what features are implemented why or how and these things. If your self-hosted solution loses data, like gitlab did not long ago on their gitlab.com hosted instance, data is gone forever. If we decentralize these things, more instances have to fail to bring the the whole project down.

There's actually a law of some sort about these things, named Amdahl's law: The more instances a distributed system has, the more likely it is that one instance is dead right now, but at the same time, the less likely it is that the whole system is dead. And this is not linear likelihood, it is exponential. That means that with 10, 15 or 20 instances you can be sure that your stuff is alive somewhere if your instance fails.

Now think of projects with many contributors. Maybe not as big as the kernel, which has an exceptionally big community. Think of communities like the neovim one. The tmux project. The GNU Hurd (have you Hurd about that one?) or such projects. If in one of these projects 30 or 40 developers are actively maintaining things, their repositories will never die. And if the repository holds the issue history as well, we get backup safety there for free. How awesome is that?

I guess I made my point.

tags: #open-source #programming #software #tools #git #github

This post was written during my trip through Iceland. It is part of a series on how to improve ones open-source code. Topics (will) contain programming style, habits, project planning, management and everything that relates to these topics. Suggestions welcome.

Please note that this is totally biased and might not represent the ideas of the broad community.

When it comes to hobby projects in the open source world, one often works alone on the codebase. There is no community around your tool or library that is actively pushing towards a nice codebase. Anyways, you can have one. And what matters in this regard (not only but also) is a nice modularization of your codebase.

Why modularization

It is not always beneficial to modularize your code. But if your program or library hits a certain size, it can be. Modularization helps not only with readability, but also with separation of concerns, which is an important topic. If you do not separate concerns, you end up with spaghetti code, code duplication and this leads to doubled effort, which is essentially more work for the same result. And we all know that nobody wants that, right? In the end we're all lazy hackers. Also, if you write code twice, trice or maybe even more often, you end up with more bugs. And of course you also don't want that.

SoC

Now that the “why” is cleared, the “how” is the next step to think about.

We talked about the separation of concerns in the last section already. And that's what it boils down to: You should separate concerns. Now, what is a concern? An example would be logging. Fairly, you should use a decent library for the logging in your code, but that library might need some initialization routines called and some configuration be passed on. Maybe you even need to wrap some things to be able to have nice logging calls in your domain logic – you should clearly modularize the logging related code and move the vast majority of it out of your main code base. Another concern would be file IO. Most of the time when you doing File IO, you're duplicating things. You don't need to catch IO errors in every other function of your code – wrap these things and move them to ankther helper function. Maybe your File IO isnmore complex and you need to be able to read and write multiple files with similar structure all the time – a good idea would be to wrap these things in a module that handles these things only. The concern of File IO is now encapsulated in a module, errors are handled there and users of that module know what they're working with – An abstraction which either works or fails in a way so that they don't have to re-write error handling in every other function but can simply forward errors or fail in a defined way.

If you think about that for a moment, you'll notice that modularization is a multilevel thing. Once you have nicely defined modules for several concerns, you might build new and more high-level modules out of them. And that's also exactly what frameworks do most of the time: They combine libraries (which might be already at a certain level of abstraction) into a new, even more highlevel, library for another concern. For example, a web framework combines a database library, templating library, middleeare library, logging library, authentication and authorization libraries into one big library, which is then – inconsistently – called a framework.

CCC

No, I'm not talking about the Chaos Computer Club here, but about Cross Cutting Concerns. A cross cutting concern is one that is used throughout your entire codebase. For example logging. You want logging calls in your data access layer, your business logic and your user interface logic.

So logging is a functionality all modules of your software want to have and need. But because it is cross cutting, it is not possible to insert it as a layer in your stack of modules. So it is really important that your cross cutting concerns are as loosely coupled to all other modules as possible. In case of logging, this is particularly easy, because after the general setup of the logging functionality, which is done exactly once, you have a really small interface to that module. Most of the time, not more than four or five functions: one call for each level of verbosity (for example: debug, info, warn, error).

Naming things...

In know, I know... this bikeshedding topic again. I don't want to elaborate to extensively on this subject, though a few thoughts must be said.

First of all, module names should be like function names (which I explained in the last episode) – short, to the point and describing what a function, in our case module, does. And that's still it. A module should be named after its domain. If the module does, for example, contain only some datatypes for your library, or only algorithms to calculate things, it should be named exactly that: “types” or “algo”.

A second thing I would clearly say is that module names should be a one-word-game. Module names should hardly contain more than one word – why would they? A domain is one word, as showed above.

Multiple libraries

What I really like to do is to have several libraries for one application, if the codebase hits a certain complexity. I like to take imag as an example. Imag is a collection of libraries which build on a core library and offer abstraction over it. This is a great separation of concerns and functionality. So a user of the imag source can use some libraries to get a certain functionality and leave out others if their functionality is not needed. That's a great separation of concerns with loose coupling. But when it comes to this approach, another topic gets important as well, and we'll talk about that in the next episode.

The naming restrictions I stated above apply to library names which are solely for separation inside of an applications, too, though I would relax the one-word-name rule a bit here.

Next

In the next Episode of this blog series we will talk about another important subject which is related to modularization: API design.

tags: #open-source #programming #software

This post was written during my trip through Iceland and published much latern than it was written.

Almost all toolchains out there do a CI-first approach. There are clearly benefits for that, but maybe that's not what one wants with their self-hosted solution for their OSS Projects?

Here I try to summarize some thoughts.

CI first

CI first is convenient, of course. Take github and Travis as an example. If a PR fails to build, one has not even to review it. If a change makes some tests fail, either the test suite has to be adapted or the code has to be fixed to get the test working again. But as long as things fail, a maintainer does not necessarily have to have a look.

The disadvantage is, of course, that resources need to be there to compile and test the code all the time. But that's not that much of an issue, because hardware and energy is cheap and developer time is not.

Review first

Review keeps the number of compile- and test-runs low as only code gets tested which is basically agreed upon. Though, it increases the effort a maintainer has to invest into the project.

Review first is basically cheap. If you think of an OSS hobby project, that might be a good idea, especially if your limited funding keeps you from renting or buying good hardware where running hundreds or even thousands of compile jobs per month can be done at decent speed.

What I'd do

I'm thinking of this subject in the context of moving away from github with one of my projects (imag, of course. Because of my limited resources (the website and repository are hosted on Uberspace, which is perfect for that), I cannot run a lot of CI jobs. I don't even known whether running CI jobs on this host is allowed at all. If not, I'd probably rent a server somewhere and if that is the case, I'd do CI-first and integrate that into the Uberspace-hosted site. That way I'd even be able to run more things I would like to run on a server for my personal needs. But if CI is allowed on Uberspace (I really have to ask them), I'd probably go for Review-first and invest the money I save into my Uberspace account.

tags: #open-source #programming #software #tools #git #github

This post was written during my trip through Iceland and published much latern than it was written.

From my last article on whether to move away from github with imag , you saw that I'm heavily thinking about this subject. In this post I want to summarize what I think we need for a completely self-hosted open source programming toolchain.

Issue tracking

Do we actually need this? For example, does the kernel do these things? Bug tracking – yes. But Issue tracking as in planning what to implement next? Maybe the companies that contribute to the kernel, internally, but not the kernel community as a whole (AFAIK).

Of course the kernel is a bad example in this case, because of its size, the size of the community around it and all these things. And other smaller projects use issue tracking for planning, for example the nixos community (which is still fairly big) or the Archlinux community (though I'm not sure whether they are doing these things only over mailinglists or via a dedicated forum at archlinux.org.

Bug tracking

Bug tracking should be done on a per-bug basis. I think this is a very particular problem that can be easily solved with a mailing list. As soon as a bug is found, it is posted to the mailing list and discussion and patches are added to the list thread until the issue is solved.

Pull request tracking

With github, a contributor automatically has an web-accessible repository. But for the most part it is sufficient if the patches are send via an email-patch workflow, which is how many great projects work. Having web-accessible repositories available is just a convenience github introduced and now everybody expects.

I think pull requests (or rather patchsets) are tracked no matter how they are submitted. If you open a PR on github, patches are equally good tracked as with mailing lists. Indeed I even think that mailing lists are way better for tracking and discussion, as one can start a discussion on each individual patch. That's not really possible with github. Also, the tree-shape one can get into when discussing a patch is a major point where mailing lists are way better than github.

CI

Continuous Integration is a thing where solutions like gitlab or github shine. They easily integrate with repositories, are for free and result in better and tested code (normally). I do not know of any bigger open source project that does not use some form of CI. Even the kernel is tested (though not by the kernel community directly but rather companies like Intel or Redhat, as far as I know).

A CI solution, though, is rather simple to implement (but I'm sure it is not easy to get it right). Read my expectations below.

How I would like it

Issue and bug tracking should be based on plain text, which means that one should be able to integrate a mailing list into the bug tracking system. Fortunately, there is such an effort named git-dit but it is not usable yet. Well, it is useable, but has neither email integration nor a web interface (for viewing). This is, of course, unfortunate. Also, there is no way to import existing issues from (for example) github. And that's important, of course.

For pull request/patch management, there's patchworks. I've never worked with it, but as far as I can see it works nicely and could be used. But I would prefer to use git-dit for this, too.

I would love to have an CI tool that works on a git-push-based model. For example you install a post-receive hook in your repository on your server, and as soon as there is a new push, the hook verifies some things and then starts to build the project from a script, which preferably lives in the repository itself. One step further, the tool would create a RAM-disk, clone the just pushed branch into it (so we have a fresh clone) and builds things there. Even one step further, the tool would create a new container (think of systemd-nspawn) and trigger the build there. That would ensure that the build does not depend on some global system state.

This, of course, has also some security implications. That's why I would only build branches where the newest (the latest) commit is signed with a certain GPG key. It's an really easy thing to do it and because of GPG and git itself, one can be sure that only certain people can trigger a build (which is only execution of a shell script, so you see that this has some implications). Another idea would be to rely on gitolite, which has ssh authentication. This would be even easier, as no validation would be necessary on our side.

The results of the build should be mailed to the author/commiter of the build commit.

And yes, now that I wrote these things down I see that we have such an tool already: drone.

That's it!

That's actually it. We don't need more than that for a toolchain for developing open source software with self hosted solutions. Decentralized issue/PR tracking, a decent CI toolchain, git and here we go.

tags: #open-source #programming #software #tools #git #github

This post was written during my trip through Iceland. It is part of a series on how to improve ones open-source code. Topics (will) contain programming style, habits, project planning, management and everything that relates to these topics. Suggestions welcome.

Please note that this is totally biased and might not represent the ideas of the broad community.

We're slowly working ourselves up from if statements in the last episode of this article series to functions in this article. I'm using the term “function” in this article, but you're welcome to interpret this word as you like – whether you're from the C world, a C++ hacker, you're lining up your types in the wonderful lands of Haskelldonia or you like snakes and play with Python all day long doesn't really matter. Functions can be Functions, Procedures, Methods or whatever you name these things in your language of choice.

So when thinking about functions, what do we have to think of first? Well, yes...

Naming things

Computer scientists and Nerds like to bikeshed this to death. We can talk about function naming all day long, can't we? I do not like this topic at all because most people cannot keep their heads calm when talking about this. So I simply list what I think should be in a general guide on function naming, and it'll help you exactly nothing:

  • Short, but not too short
  • Expressive what the function does, but not too expressive
  • To the point
  • Not interpretable
  • Should only contain the good-case in the name
  • Should not contain “not” or “or” and “and” – except when it should

Shrtpls

Function names should be short. If you have a look at the C “string” header, you'll find a function names “strlen”. This name is truly wonderful. It is short, to the point.

Your function names shouldn't be too short. So single-character names are a no-no! Even two characters are most certainly too short. One-word-names are a good way to go for simple functions. So a function “sum”, “ceil” or “colour_in” are fine.

Expressiveness

A name should always express what the function does. The examples from the last section are a good example for this. Bad examples are “enhance”, “transform” or “turn_upside_down”.

If a function name has to be a bit longer to express what it, the function, actually does, that's okay I guess.

To the point / Not interpretable

A reader of your code should understand what your function does when reading the name alone, maybe including the types involved, if your language offers types. But not only that, she should also be able to tell you, the writer of the code, what the function does without you correcting her.

I think it is always a good idea to think “from a third persons perspective”. If you think someone else can tell you what your code does, you can consider it good enough. Not perfect – every code can be improved. But good enough.

The rest

I want to summarize the rest of the points from above in this section. A good heading for this section might be “good practices for function naming”, but that also might not fit as nicely as it should.

The thing is, if your function actually implements business logic (you might not have a “business” in your open source codebase, but at least a domain), it really should not contain terms that are boolean operators. For example a function should never be names “does_not_contain_errors”. Not only is this name way to long, also including boolean logic in function names makes it harder to actually using them in boolean expressions because you have to wrap your head around these things all the time. A better name would be “contains_errors” - You can use a negation operator on this after calling!

In the end it's all about size

Out there in the primal world bigger is better. Bigger muscles, bigger knifes, guns or tanks, even bigger cities, cars, houses. But in the world of programming, things are reversed. Small things matter!

So your functions should be small. As small as possible, actually. Todays compilers can inline and optimize the hell out of your code – and if you're one of the scripting language enthusiasts out there – does speed actually matter in your domain? Steve Klabnik once said that, in the Rails world, people tell each other that yes, Ruby is a slow language, but network and the database are your bottleneck, so nobody cares.

Also, think of the advantages of short functions: people can more easily understand what you're doing, testability gets improved because you can apply tests on a more fine-grained level and documentation might also get improved because you have to document more functions (that's another topic we'll discuss in a different article as well).

I really don't want to write down any line numbers or approximations on how long a function actually should be. That's not only because I don't want to burden myself with this, but also because I cannot give any advice without knowing what the function should do – if you have a complex function that has to do a lot of things in preparation for the domain logic which cannot be outsourced (for example acquiring some locks), it might be 100 lines or more (also depending on your language). If you're doing really simple things for example building a commandline interface or setup work, it might be even 1000 lines or more. I really cannot tell you how much is too much.

But in general, shorter is better when it comes to functions.

Scoping

Some programming languages offer scopes. For example C, C++ or Rust, but also Java and even Ruby.

Scopes are awesome. Variables get cleaned up at the end of the scope, You can, of course, use them for if statements or loops. But scopes are also good for structuring your code!

In the above section I wrote that some functions might need to do some setup work or preparation before actually implementing their logic. For example, if a function needs to acquire some locks or allocate some resources before doing the actual work.

In some cases it is possible to separate the domain logic of the functions by simply applying some scopes. Let me write down an example in pseudo-rustic code:

fn calculatesomevalue() { // do // some // setup

{ // domain // logic }

// cleanup }

I guess that makes my point pretty clear.

Next

Up next we will talk about modularization of code and how you can structure your programs by seperating things into classes, modules, namespaces etc etc.

tags: #open-source #programming #software

This post was published on both my personal website and imag-pim.org.

I'm thinking of closing contributions to imag since about two months. Here I want to explain why I think about this step and why I am tending into the direction of a “yes, let's do that”.

github is awesome

First of all: github is awesome. It gives you absolutely great tools to build a repository and finally also an open source community around your codebase. It works flawlessly, I did never experience any issues with pull request tracking, issue tracking, milestone management, merging, external tool integration (in my case and in the case of imag only Travis CI) or any other tool github offers. It just works which is awesome.

But. There's always a but. Github has issues as well. From time to time there are outages, I wrote about them before. Yet, I came to the conclusion that github does really really well for the time being. So the outages at github are not the point why I am thinking of moving imag away from github.

Current state of imag

It is the state of imag. Don't get me wrong, imag is awesome and gets better every day. Either way, it is still not in a state where I would use it in production. And I'm developing it for almost two years now. That's a really long time frame for an open source project that is, in majority, only developed by one person. Sure, there are a few hundred commits from other, but right now (the last time I checked the numbers) more than 95% of the commits and the code were written by me.

Imag really should get into a state where I would use it myself before making it accessible (contribution wise) to the public, in my opinion. Developing it more “closed” seems like a good idea for me to get it into shape, therefore.

Closing down

What do I mean by “closing development”, though? I do not intend to make imag closed source or hiding the code from the public, that's for sure. What I mean by closing development is that I would move development off of github and do it only on my own site imag-pim.org. The code will be openly accessible via the cgit web interface, still. Even contributions will be possible, via patch mails or, if a contributor wants to, via a git repository on the site. Just the entry gets a bit harder, which – I like to believe – keeps away casual contributors and only attracts long-term contributors.

The disadvantages

Of course I'm losing the power of the open source community at github. Is this a good thing or a bad thing? I honestly don't know. On the one hand it would lessen the burden on my shoulders with community management (which is fairly not much right now), issue management and pull request management. On the other hand I would lose tools like travis-ci and others, which work flawlessly and are a real improvement for the development process.

The conclusion

I don't really have one. If there would be a way to include Travis into a self-hosted repository as well as some possibility for issue management (git-dit isn't ready in this regard, yet, because one cannot extract issues from github just yet), I would switch immediately. But it isn't. And that's keeping me away from moving off of github (vendor lock in at its finest, right?).

I guess I will experiment with a dedicated issue repository with git-dit and check how the cgit web interface works with it, and if it seems to be good enough I will test how it can be integrated (manually) with emails and a mailing list. If things work out smoothly enough, I will go down this road.

What I don't want to do is to integrate the issue repository in the code repository. I will have a dedicated repository for issues only, I guess. On the other hand, that makes things complicated with pull request handling, because one cannot comment on PRs or close issues with PRs. That's really unfortunate, IMO. Maybe imag will become the first project which heavily uses git-dit. Importing the existing issues from github would be really nice for that, indeed. Maybe I'll find a way to script the import functionality. As I want a complete move, I do not have to keep the issue tracking mechanisms (git-dit and github issues) in sync, so at least I do not have this problem (which is a hard one on its own).

tags: #open-source #programming #software #tools #git #github

This article is a cry for a feature which is long overdue in KDE, in my humble opinion: Syncable, user readable (as in plain-text) configuration files.

But let me start explaining where I come from.

I started with Linux when I was 17 years old. At the time I ran an Kubuntu 9.04 (if I remember correctly) with KDE 3. I disliked Gnome because it looked silly to me (no offense, Gnome or Gnome people). So it was simply aesthetics which made me use KDE. Before switching to Linux I had only experienced the cruel world of Microsoft Windows, XP at the time. When I got a new Notebook for my Birthday, I got Vista and two days later I had this Linux thing installed (which friends of mine kept talking about). Naturally, I was blown away by it.

After some time I got rather comfortable using this black box with the green text on it – the terminal finally had me! Soon, I launched a full blown KDE 3 (themed hackerlike in dark colors) to start a few terminals, open vim and hack away. Then, the same friend who suggested Linux told me about “tiling window managers”. wmii it was shortly after.

A long journey began. After some troubles with Ubuntu 12.04 I switched to Archlinux, later from wmii to dwm and after that to i3 which I kept for a few years.

In 2015 I learned about NixOS, switched to it at the beginning of 2016 and in late 2016 I reevaluated my desktop and settled with XFCE.

And here's the thing: I wanted KDE, but it lacked one missing critical feature: Beeing able to sync its configuration between my machines.

I own a desktop machine with three 1080p Monitors and two Notebooks – a Thinkpad X220 and an old Toshiba Satellite (barely used), if that matters at all.

There are things in the KDE configuration files which shouldn't be there, mainly state information and temporary values, making the whole thing a pain in the ass if you want to git add your configuration files to git push them to your repository somewhere in the web and git pull it on another machine.

Apparently the story is not that much better with XFCE, but at least some configuration files (like keyboard shortcuts, window decoration configuration and menu-bar configuration) can be git added and synced between machines. And it works for me, even with two so different machines. Some things have to be re-done on each machine, but the effort is pretty low.

But with KDE (I tried Plasma 5), it was pure PITA. Not a single configuration file was untouched, reordering values, putting state information in there and so on and so forth.

And I am rather amazed with KDE and really want to play with it, because I think (and that's not just an attempt to kiss up to you KDE guys) that KDE is the future of Linux on the desktop! Maybe it is not for the Hacker kid from next door, but for the normal Linux User, like my Mom, your Sister or the old non-techy guy from next door, KDE is simple enough to understand and powerful enough to get stuff done efficiently without getting in your way too much.

So here's my request:

Please, KDE Project, make your configuration files human readable, editor-friendly and syncable (via tools like git). That means that there are no temporary values and no state information in the configuration files. That means that configuration files do not get shuffled when a user alters a configuration via a GUI tool. That means that the format of the configuration files do not get automatically changed.

The benefits are clear, and there are no drawbacks for KDE as a project (as far as I can see) because parsing and processing the configuration files will be done either way. Maybe it even reduces the complexity of some things in your codebase?


A word on “why don't you submit this to the KDE project as a request”: I am not involved with KDE at all. I don't even know where the documentation, bug tracker, mailinglist(s), ... are located. I don't know where to file these things, whether I have to register at some forum or whatever.

If someone could point me to a place where nice people will discuss these things with me, feel free to send me an email and guide me to this place!

tags: #open-source #software #tools #git #desktop #linux

This post was written during my trip through Iceland and published much latern than it was written.

Right now I'm sitting in front of our mobile home in Iceland, at the most western point of europe. We've had a great day here and the camping side (nothing more than a few square meters of grass, but enough to host 20-30 parties with tents or mobile homes) is nice.

But what I do not understand is the stupidity of people. There are marked ways around here, everything else is nature and therefore protected area. Still, people are running through the grass and around in places where they are not supposed to. A few days ago we even saw a group of people multiple times pass a bird-protecting area without even considering taking the official (and nearby) roads.

I mean, are these people not thinking?

Also, just moments ago, a fellow camper entered the area and placed his car right next to ours... leaving out more than 50 square meters where enough space is available, but no, he placed his car less than two meters next to ours.

Next, a car with three Americans arrived. The little girl, maybe of 14 years, ran around in short pants and without shoes. In Iceland. We just had two pants and three Shirts to keep warm. Now they're sitting in the (still running) car to warm their selves. What the fuck?

So we moved to another position within the camping place. Hope that gets these idiots thinking, but I doubt it.

To end this rant: Pleace think before running around in protected areas and nature, especially if you're on vacation in another country. And also leave fellow hikers alone. In some places in Iceland there is little space for hikers with tents and such, but most times there is enough space to have 10 or more meters between you and your neighbors. And if you are too stupid to prepare yourselves for vacation with proper clothing, please stay at home.

tags: #traveling #iceland #life

This post was written during my trip through Iceland. It is part of a series on how to improve ones open-source code. Topics (will) contain programming style, habits, project planning, management and everything that relates to these topics. Suggestions welcome.

Please note that this is totally biased and might not represent the ideas of the broad community.

During my trip through Iceland I had a really great time seeing the awesome landscapes of Iceland, the Geysir, the Fyords and the Highlands. But in the evening, when the wind got harsher and the temperatures went down, I had some time for reading and thinking.

This is when I started this blog article series. I thought about general rules that would help me improving my open source code.

I started thinking about this subject after reading a great article once Fluent C++, about how to make if statements more understandable. This article really got me thinking about this subject, because it makes some really good points and if you haven't read it yet, you clearly should invest a few minutes reading it (and this blog in general, of course). I'll wait and we'll continue if you're ready reading it.

Why thinking about this in the first place?

Well, everyone knows you're already writing the best code possible and everyone who doesn't understand it is not worth your almighty code! Some people sadly think like this. And of course this is not true at all.

Once I've read this great statement, I guess it was also on the Fluent C++ Blog (again: read this blog, it is awesome) which goes approximately like this:

If you look at code you've written six months ago and you cannot think of a way to improve it, you haven't learned anything in six months, and this is as bad as it can get

If you think about this for one minute, it is absolutely right and you really don't want to be at this point. So, you have to start to think about your code. But where to start? Well, at the beginning, you might think. And that's absolutely right. You have to think about the small things first, so lets start with if statements.

Making if statements more understandable

Basically what Jonathan Boccara said in Fluent C++ said. I won't repeat what he has written, just shortly conclude: give long if expressions names by defining functions for the conditions, represent the domain in your if statements, don't be more general than your domain specifications.

The last of these is the point I want to focus on in this article. Full quote:

Don't compress an if statement more than in the spec

But in open source software development we often do not have any spec. If you're working on a hobby project in your free time, improving someones code or contributing some functionality to an open source project you're interested in, you only have the idea in your (and maybe also in someone elses) head. Sometimes you or some other people already had a discussion about the feature you're about to implement and a rough idea how it should be done. Maybe even a concrete idea. But you'll almost never have a specification where edge cases, preconditions and invariants of your functionaly are defined. And most of the time you don't need one. In open source, people come together who have a similar interest and goal - no specifications required, because all contributors involved know what a functionality should do and what not.

Show me code

In the following, I'm using Rust as language for my examples. I'm not doing anything Rust specific here, so people without knowledge of the Rust programming language shouldn't have to learn new things to understand my point, it is just that I'm most comfortable with this language right now.

So what Jonathan already said, one should not make if statements arbitrarily long and complex. Helper functions should be used for statements, even if these functions are only used once. It can heavily improve the readability of your code.

if (car.haswheels(context.requirednumofwheels()) && car.maxspeed() > SpeedUnit::KMH(25)) || car.buildingyear() > Time::Year(2000) { // ... }

the condition from above can be greatly improved in readability by moving it to a helper function.

fn carisneworfast(car: &Car, context: &Context) –> bool { car.haswheels(context.requirednumofwheels()) && car.maxspeed() > SpeedUnit::KMH(25)) || car.buildingyear() > Time::Year(2000) }

//...

if carisneworfast(&car, &context) { // ... }

You might think that does not improve the code at all, just moving the complexity somewhere else – that's not entirely true. If you have only five-line-functions, yes. But if your functions are ten, fifteen, fifty or even a hundred lines long and you have several if statement of similar complexity, moving away such things can improve it a lot.

Also, you can make complex conditions testable by moving them to functions, which is also a nice-to-have.

But, but, but... speed?

One might come up now with the obvious question: Does my code get slower because of this? I would say it depends. Fluent C++ has answered this question for C++, and I would guess that this also holds for Rust, maybe even without the 2%/7% speed decrease Jonathan is experiencing, especially if the code is inlined by the Rust compiler. Even though you might get a bit slower code, you have to think of the one question that I greatly value, not only when it comes to execution speed, but also in other cases: Does it matter?

Does it matter whether your code gets a bit slower? Is this particular piece of code crucial for your domain? If not – expressiveness first, speed second! If it does, write the expressive version first and then: measure. If the expressive version has a performance impact you cannot tolerate, you can still optimize it later.

Next...

What's up next? Well, I don't know. I will get myself inspired by other blog posts and articles and maybe I'll publish the next article for this series soonish. But maybe it takes a month or two, maybe even more, until I got some content. I don't want to make this a weekly thing or something like that, so I'll leave it undefined when the next article of this series will be published.

Thanks for reading.

tags: #open-source #programming #software

In this blog post, which might turn into a short series, I want to plan a rust library crate and write notes down how to implement it.

This article was yet another one that I wrote while being on my trip through Iceland. As you can see – my head does never stop thinking about problems.

Usecase

So first of all, I want to write down the use case of this library. I had this idea when thinking about how to design a user frontend for imag (of course) and came to the conclusion that rust lacks a library for such a thing. So why not writing one?

I want to design the user interface of this library crate approximately like Rails did with their implementation of the same functionality for Ruby (bear with me, I'm not that involved in the Ruby world anymore so I don't know whether this is actually Rails or just another gem that comes with it).

So what I want to be able to do is something like this:

let event_date = today() – days(2) + weeks(10);

for example. I'm not yet entirely sure whether it is possible to actually do this without returning Result<_, _> instead of real types (and because I'm in Iceland without internet connection, I cannot check). If results need to be returned, I would design the API in a way so that these functions and calls only create a AST-like object tree which then can be called with a function to calculate the final result:

let eventdate = today() – days(2) + weeks(10); let eventdate = try!(event_date.calc());

But even more ideas come to mind when thinking about functionality this library may provide:

// Creating iterators today().repeat_every(days(4)) // –> Endless iterator

// Convenience functions (today() + weeks(8)).endofmonth() // The end of the month of the day in 8 weeks

today().endofyear().day_name() // name of the day at end of the current year

today().until(weeks(4)) // Range of time from now until in 4 weeks

// more ...

Later on, a convenient parser could be put in front of this, so a user can actually provide strings which are then parsed and be calculated.

calculate(“now – 4h + 1day”)

Which then could of course be used to face a user as well.

Core Data types

As the foundation of this library would be the awesome “chrono” crate, we do not have to reimplement all the time-related things. This eases everything quite a lot and also ensures that I do not double work which others have done way better than I could have.

So at the core of the library, we need to encapsulate chrono types. But there are many user-facing types in chrono and we cannot assume we know which of them our users need. So we have to be generic over these types, too. This is where the fun starts.

At the very base level we have three kinds of types: Amounts (like seconds, minutes, etc, fixed points in time as well as time ranges:

pub enum TimeType { Seconds(usize), Minutes(usize), //... Years(usize), Point©, Range(A, B) } // A, B and C being chrono types which are wrapped

As I assume right now, we cannot simply subtract and add our types (and thus chronos types) without possible errors, so we have to handle them and return them to the user. Hence, we will create intermediate types which represent what is about to be calculated, so we can add and subtract (etc) them without error:

enum OpArg { TT(TimeType), Add(AddOp), Sub(SubOp) }

pub struct AddOp(OpArg, OpArg); pub struct SubOp(OpArg, OpArg);

trait CalculateableTime { calc(self) –> Result; }

with the trait implemented on the former types – also the enum maybe as I explain in a few words.

To explain why the CalculateableTime::calc() function returns a TimeType rather than a chrono::NaiveDateTime for example, consider this:

(minutes(15) – seconds(12)).calc()

and now you can see that this actually needs the function to return our own type instead of some chrono type here.

The OpArg type needs to be introduced to be able to build a tree of operations. In the calc implementation for the types, we can then recursively call the function itself to calculate what has to be calculated. As the trait is implemented on TimeType itself, which just returns Self then, we automatically have the abort-condition for the recursive call. To note: This is not tail-recursive.

Optimize the types

After handing this article over to two friends for some review, I got told that the data structures can be minified into one data structure. So no traits required, no private data structures, just one enum and all functions implemented directly on it:

enum TimeType { Seconds(usize), Minutes(usize), //... Years(usize), Point©, Range(A, B),

Subtraction(TimeType, TimeType), Addition(TimeType, TimeType) }

and as you can see, also almost no generics.

After thinking a bit more about this enum, I concluded that even things like EndOfWeek, EndOfMonth and such have to go into it. Overall, we do not want a single calculation when writing down the code, only lining up of types where the calculate function takes care of actually doing the work.

Helper functions

In the former I used some functions like seconds() or minutes() – these are just helper functions for hiding more complex type signatures and can hopefully be inlined by the compiler:

pub fn seconds(s: usize) –> TimeType { TimeType::Seconds(s) }

So there is not really much to say for these.

Special Functions, Ranges, Iterators

To get the end of the year of a date, we must hold the current date already, so these functions need to be added to the TimeType type. Ranges can also be done this way:

now().until(tomorrow()) // –> TimeType::Range(_, _)

Well, now the real fun begins. When having a TimeType object, one should be able to construct an Iterator from it.

The iterator needs to be able to hold the value it should increase itself every time as well as a copy of the base value. With this, one could think of an iterator that holds a TimeType object and every time the next() function is called, adds something to it and returns a copy of it.

Another way of implenting this would be to know how many times the iterator has been called, multiply this with the increase value and add this to the base.

I like the latter version more, as it does not increase the calculations needed for getting the real value out of the TimeType instance every time the iterator is called.

This way, one can write the following code:

let v : Vec<_> = now() .every(days(7)) .map(TimeType::calculate) .take(5);

to retrieve five objects, starting from today, each separated by one week.

Next

What I think I'll do in the next iteration on this series is summarize how I want to develop this little crate. I guess test driven is the way to go here, after defining the type described above.


Please note: This article was written a long time ago. In the meantime, I learned from a nice redditor that there is chrono::Duration which is partly what I need here. So I will base my work (despite beeing already started into the direction I outlined in this article) on the chrono::Duration types and develop the API I have in mind with the functionality provided by chrono.

For the sake, I did not alter this article after learning of chrono::Duration, so my thoughts are lined up like I originally had them.

tags: #open-source #programming #software #tools #rust