musicmatzes blog

programming

Holy crap, I haven't written on my blog for a long time. And I almost missed that the Rust community asked for blog posts about Rust in 2021 - but I am in time I guess, so here it goes.

Most Rustaceans won't agree with this blog post, I guess. But I also think that's fine, because that's the whole point of the Blog-Post-For-The-Roadmap thing, right? Asking people for different opinions and starting a constructive discussion about the topic. I also must say that I haven't read a single one of the other Rust-2021 Blog posts just yet.

I also think this will be rather short, but I hope I express my feelings in the best way possible for you all to understand.

Don't Change!

I got into Rust at about Rust 1.5.0. After the first half of 2020, I felt like January was years ago, so I feel like Rust 1.5.0 was in another lifetime. So much happened this year, and still, so little was accomplished by me and my friends. The world turned upside down, essentially.

Rust changed a lot between 1.5.0 and the current compiler I have installed on my system:

$ rustc --version
rustc 1.46.0 (04488afe3 2020-08-24)

The RELEASES.md file is a whooping 9167 lines long. We got cargo workspaces, we got awesome things like the ? operator (which I definitively was not a friend of in the beginning), we got associated constants, incremental compilation, impl Trait, we got const functions and most importantly we got async/await.

Fairly, that's where I started to struggle to keep up. I definitively see the value in async-await and what it actually enables us to do with Rust, and how to do it. But I just couldn't keep up with the change anymore. It was too much. I couldn't cope learning all these new things just in time they arrived. I, to this day, struggle to write a simple Program with async/await if there's too much iterators involved. I don't know where my actual problems are, because I cannot see through the whole concept enough to understand what I am doing wrong.

The last five years were full of change. Good change, of course. But this last (almost)year just drowned me. Too much to handle.

My hope is, that Rust does not change anymore when it comes to features. I see that there is a lot of demand for const generics, especially by the embedded community. I understand why. I hope it doesn't have any impact on me as a commandline-program-writing Rustacean.

But...

But. There's always a “but”, isn't there?

I still have high hopes for some things concerning Rust. But they do not at all have to do with the language Rust, but the environment around it. As stated before, I'm a commandline-program writing person. I do not write web services (yet?), I do not write embedded stuff, I do not write high-performance/performance-critical stuff.

Essentially: I write programs in Rust that others would write in Python, Ruby or Node. I write them in Rust because I am a lazy programmer, because I do not care enough. I write Rust, because the compiler YELLS at me to get it right. If I would do the same thing in Ruby, my go-to-language for everything below 100LOC, I would get myself into a wheelchair because I would use every footgun available.

Deep inside, I'm a bad programmer and rustc forces me to be a good one.

That was a bit of a rant, I hope you're still with me. The paragraph above was for you to understand where I come from. I don't care if my program runs in 1 second or 10, because the domain I write for does not care most of the time. But what is important to me, is that I actually can write my programs. Often, I cannot. And that's simply because of one thing:

Libraries are missing.

Those who know me knew that this was coming. Libraries for domains that I care about are missing. That is calendar (icalendar) reading/writing, vcard reading/writing, email reading/writing (the format, not the networking stuff), ... There are already libraries out there for these things, although they are far from being complete, usable or even correct. Writing a simple TUI MUA for notmuch is pain right now, because parsing email is really hard in Rust, and there are no high-level libraries available. The “mail” crate ecosystem is closest, but they do not yet have a parser.

There are, I am sure, more things in this part of the ecosystem (that is libraries for basic formats) where Rust could shine, but does not yet.

My request for Rust in 2021 is: Make things shiny. Make them available, make them work, make them correct, make them nice to use (E.G. parsing mails into tokens and handing them to me is okay, but having a high-level interface is much nicer).


To sum it up in one sentence:

Don't change rust itself, but improve the library ecosystem.

That's my hope for Rust 2021. Thank you for having me in this awesome community and thank you for reading.

tags: #rust #programming

Finally, I managed to implement a proof of concept of serde-select. But lets start at the beginning.

The Problem

The problem I tried to solve with this crate is rather simple: You need to be able to get values from a serde-compatible document (e.g. toml, json, yaml, ...) but you don't know the full schema of the document at compiletime of your crate.

The origin of the idea of serde-select was when I first started working on my imag project, where a lot of seperated crates coexist in one ecosystem, but all of them should be configured in one big configuration file. Of course I did not want to have one central crate just for defining the schema, especially since a user might not want to use all functionality from the ecosystem, thus not having a “full” configuration file, but only the parts they needed.

So I started writing “toml-query”, a crate which lets the programmer query a toml::Value with a “path”. For example:

[calendar]
list_format = "{{lpad 5 i}} | {{abbrev 5 uid}} | {{summary}} | {{location}}"
show_format = """
{{i}} - {{uid}}
"""

[ref]
[ref.basepathes]
music = "/home/user/music"
contacts = "/home/user/contacts"
calendars = "/home/user/calendars"

The document looks like this, but in the program code we only need calendar and its sub-values. So we can do

let r = document.read("calendar.list_format");

in the code and get a Result<Option<&'document Value>> value back.

toml-query evolved over the time, now featuring more flexibility by implementing “Partials”, how I call them. These are structs that are Serialize + Deserialize and have a path attached to them, so deserializing the partial document is possible right away:

let r: Result<Option<CalendarConfig>, _> = document.read_partial::<CalendarConfig>();

where CalendarConfig: Serialize + Deserialize + Debug + toml_query::Partial (see here).

The evolution

toml-query works perfectly fine and I use it in my other projects a lot. It is fast and easy to use. Its error reporting is nice.

But an idea formed in the back of my head and I did not stop to think about it.

Can toml-query by generalized to work with all formats serde can handle?

So I started to experiement with a more general implementation: serde-select was born.

And today I managed to get the first bits working.

Meet serde-select

serde-select implements a “read” functionality for both JSON and TOML, depending on what features you enable. The inner implementation of the resolve-algorithm is agnostic of the actual format.

For a quick overview how to use the crate right now, have a look at the tests for toml for example.

I strongly advice against using this crate, though. It is only an experiement for now and shouldn't be used in production code. Nevertheless, I published the first preview on crates.io.

tags: #rust #programming

This is a reply to the article published by Drew DeVault called “Rust is not a good C replacement”.


First of all, let me say that Drew is one of the people out there on the internet whose opinions I highly value. Indeed he is one of the people I try to read and listen to regardless of topic, because I think he is one of the people that deserve unconditional attention.

Needless to say that I've also read his latest piece “Rust is not a good C replacement”. I have to admit I was shocked at first, but after a bit of cooling down (and doing the dishes), I can see where this comes from.

And I have to disagree.

But let me start with my background – because that might be important for you, dear reader, to classify this article.

My background is mostly hobbyist programming. I did a few years of C, probably a few 100kloc, not more. I also do rust since about Rust 1.5.0 (2015-12). I started a job where I expect to write C and C++ professionally about 1 month ago.

So, I do not have a background like Drew with probably millions of lines of C, but I guess that I have a bit more experience with Rust – I wouldn't say that I'm a Rust professional, but I would consider myself a “Advanced Rust Hobbyist”.

I'm also not as skilled in writing blog articles or even with the english language, so keep that in mind when reading this.


I am not a big fan of statement-by-statement replying to an article, but I guess for this type of article it is good enough.

First of all, Drews initial statement that Rust was designed by C++ programmers: Yes, I absolutely see that this is true. Nevertheless I have to say that these C++ programmers started developing Rust because C++ was too complex and too error-prone in what it did and how it did it. Rust is far away from the complexity C++ gives us in terms of language features! From the top of my head, we have

  • A full-blown OOP programming paradigm, including
    • Overloading
    • “friends”
    • (multi-)inheritance
    • abstract classes
    • partly and fully virtual functions
    • pointers and references
    • implicit conversions
    • Copy/Move constructors
    • Dynamic and static polymorphism
  • Manual memory management
  • Template Metaprogramming / Generic programming
  • operator overloading
  • Lambda expressions
  • Exceptions

in C++, whereas in Rust we only get

  • Dynamic and static polymorphism
  • operator overloading
  • Lambda expressions
  • Generic programming[^1]

([^2])

You might consider this list cheated as Rust is not an object oriented language like C++, but an imperative one like C. That is very true. Nevertheless it is one reason why the cognitive load a C++-Program requires one to handle is much higher than an equivalent (as in features of the program) Rust program!

Drew claims that the values of C and C++ programmers are incompatible and I would agree with that. But that does not (have to) mean that a C-programmer and a Rust programmer do not have the same values. It is true, though, that Rust can excel at a lot of topics that C++ covers, but it also empower programmers that do not feel comfortable writing good C code to write their software. And that in a safe and at the same time performant language while not being overly blown up.

Further Drew compares C, C++, Go and Rust by their complexity, measured with features introduced in the language over the years. I am really sorry to say this here, Drew, but we are used to much better from you! You say that this approach (bullet points/features listed on wikipedia vs. bullet points in articles and release notes) is not very scientific, yes. But not even to mention the years these languages were released! For the record:

I am not saying that this disproves your statement – it even supports it! But I do say that comparing based on features per year/release/whatever must include a statement about how old these languages are, even if it is just for showing the reader about what timeframe we are talking.

So, Rust is a relatively young language and yes, it has added a lot of features and therefore can be compared to C++ much better than it can be to C. But saying that a 10-year-old C programm might even compile today and everything might be okay but not so much with Rust is just ignorant of the fact that Rust is not even that old. Having a Rust program that is one year old still compiles fine today (assuming it didn't use a compiler bug) and does not “look old” at all! Rust has a big infrastructure for doing regression tests and for being compatible with older programs.

As you say, out of the way with the philosophical stuff and lets get down to the facts.

C is more portable. But as mentioned before, C is almost six times as old as Rust. We'll get there!

C has a spec. Yes, and I completely hear you on this one. Rust does not (yet?) have a spec and it really is a pain-point. I want one, too! Maybe we'll get there at some point. By the way: Does Go have a spec? It seems like it, but that rather looks like a language definition and I doubt that this is what Drew meant when talking about “a spec”, is it?

C has many implementations. Yes and how much trouble has it been because different compilers do different things on undefined behaviour? Too many. This is where Rust tries to solve a problem: Get a language where undefined behaviour is not allowed or at least as minimal as possible, then we can have a spec for that language and then we can have different implementations. Time will tell whether we can get there.

C has a consistent & stable ABI. Point taken. I do not argue about that.

Cargo is mandatory. Yes, another point taken. I again do not argue.

Concurrency is generally a bad thing. This statement gives me the impression that you did not yet try Rust, actually. Like in a big (and possibly multithreaded/concurrent/parallel/whateveryoucallit) environment. You say that most software does not have to be parallel and I fully agree on that – but if you need to be parallel, I'd rather chose Rust over Go, C or C++. Having the safety guarantees Rust gives me allows normal people (and not Rockstar-programmers) to write software that can be massively parallel without having to fear about deadlocks and other ugly things you get with other languages.

It is still true that bad design decisions are possible and might result in bad software – but that is true for every language, isn't it? And I'd rather like to have a bad program that gets the job done because it can be statically verified that it does than a program that crashes because I ran into a bug that was introduced by bad design decisions.

The next paragraph Drew writes makes me really, really sad. Fullquote:

Safety. Yes, Rust is more safe. I don’t really care. In light of all of these problems, I’ll take my segfaults and buffer overflows. I especially refuse to “rewrite it in Rust” – because no matter what, rewriting an entire program from scratch is always going to introduce more bugs than maintaining the C program ever would. I don’t care what language you rewrite it in.

This gives me the impression that Drew was hit with “Just rewrite it” too many times. And I completely agree with you, Drew, that you should indeed not rewrite it in Rust just for the sake. Nobody should ever rewrite anything in any other language than what “it” currently is written in. I hate these people that actually say things like that (if it isn't for trolling, but I have the uneasy feeling that Drew was hit with real “Just rewrite it”ers and not just trolling).

I do not say that the points Drew shows are false.

What I do say is that the initial assumption that Rust is there to replace C or C++ is, in my opinion, false. It is certainly meant to get things right that C++ got wrong – and it is certainly there to replace the C++-Monster that we call Gecko, because Mozilla is exactly trying to do that! But it is not there to replace all C or C++ code ever written because of some stupid “Hey we can do X better than your language” bullshit!

Also, the statement that Rust might end up as Kitchen-Sink like C++ and die with feature-bloat is one that concerns me because I do not want Rust to end up like C++. It certainly is not as complex as C++ and we (as in “the Rust community”) have a lot of work to do to not end up with feature-creep – but we are also certainly not there yet. But I definitively see where this statement is coming from.

The title of this article is “Rust is one of the best C replacements we currently have” – and I stand by this. But I also think that it is false to say that anyone has to replace C or that Rust is necessarily there to do so.

There are domains where you might want to rewrite C code, if you have the time and resources. But I'd rather advice against it[^3]. Improving existing code is always easier than a rewrite of a program and rewriting software does not improve the value of the software or even make customers more happy. Rewriting software is IMHO only legit in two cases:

  • It makes you happy because you're doing it for fun
  • It makes your boss happy because he ordered you to do so (for whatever reasons, may it be speed, resource usage, customer request or whatever)

But just for the sake of it? Nah.

I see where Drews article comes from and I see why he thinks like he does. I greatly value his opinion and thoughts, and that's why I took the time to write this article.

I see that we (as in “the Rust community”) have a lot to do to make more people happy. Not as in making them Rust programmers, because that's not our goal, but as in showing them that we do not want everything to be written in Rust and that it is just trolls that request a “rewrite in Rust”.

We do value friendlyness and kindness – let me state explicitely that this does also include other programming-language-communities (and all other communities as well)!

Trolling does not help with that.

[^1]: Yes we have generic programming in Rust. I'm not a professional regarding C++, so I cannot say whether they are comparable in this regard. [^2]: Some might say that we have manual memory management in Rust as well. That might be true by definition, but not the way I meant it: In C++ we can allocate something on the heap and then forget it. We have to try really hard to do that in Rust, though! [^3]: In fact I might get into the situation where I have to rewrite an application in my job, but I'd rather rewrite it in the same language than switching languages just for the sake of it!

tags: #open-source #programming #rust #c #c++ #cpp

This post was written during my trip through Iceland and published much later than it was written.

This is a really important topic in programming and I really hope to get this article right. Not only for technical correctness, but also for ease to understand, as explaining types is not that simple if one has never heard of them.

Let's give it a try...

What are types

Well, that's a question which is, in my opinion, not easy to answer. In fact, I thought several days about this question before writing this down, in hope it will become a sufficient answer. Hence, you might find other answers which are easier to understand and maybe more correct as mine, but I'll give it a try nonetheless.

From what I think

Types are a combination of abilities and properties that are combined to express and limit a certain scope of a thing.

For example, A type Car may have four wheels, two doors and a horn (its properties) and can drive slow, drive fast and park (its abilities). That is certainly not a real representation of a car (also because only a car is a real representation of a car) but because of the domain this is used in, it is sufficient in the scenario at hand. The type Car cannot be multiplied, but another type Number may have this ability. Thus, the scope and abilities are also limited in a certain way.

I hope this description is a good one for you to understand.

Now that we know what types are, we should also learn some other terms around the subject of types. The first thing I want to talk about here is “strong typing” and “weak typing”. The reason for this is: These things do not exist. Yes, you've read this correctly: There is no such thing as “strong typing”. There is only stronger and weaker typing. The Java programming language is not strongly typed, neither is it weakly typed (but it is, of course badly typed... forgive me that joke, pity java programmer).

But what is a stronger typing? That is rather simple to explain, actually. We discussed that types are limitations of things to be able to only do some specific operations and such. These things are enforced by the compiler or interpreter of the programming language, of course. And stronger typing only says that the compiler has more information (implicitly via the definition of the programming language) to enforce these rules, the rules of “the function A is defined for type T, so you cannot call it on U”. Of course there is more to that because of generic typing and so on, but that's basically it.

The next term is “type inference”. Type inference is nothing a programmer experiences explicitely, because it happens implicitly. Type inference is a feature of the compiler and interpreter of the language to guess the type of a variable without the programmer stating the actual type. It's nothing more to that actually.

I mentioned the term “generic types” in one of the former paragraphs already, so we should have a look there, too. Generic types, or shorter Generics, are types which are partial, in a way. So for example, one can define a Bag of things, whatever things is. This is often (at least in typed languages – languages where types actually matter for the compiler or interpreter) specified in the code via “type parameters” (though this term differs from language to language).

Why more types are better then few

The more types you introduce in your programs (internally or even for the public API), the more safety you get (speaking in the context of a stronger typed programming language, but also if you do a lot of runtime-type-checking in a weaker typed language). That does not mean that you should introduce a BlueCar, a BlackCar and a GreenCar as types in your program, but rather a type Color and a type Car whereas each Car has a Color – even if your domain is cars and not colors.

Maybe that example lacks a certain expressiveness, so consider this: Your Car has wheels. You can set the number of wheels when constructing the Car object. But instead of passing an integer here, which would yield an API where one can pass 17 as valid number for the number of wheels – or 1337 or possibly even -1. But if you introduce a type which represents the number of wheels, you get some safety into the construction of the Car object – safety checks in your code are not necessary anymore and thus your code will be shorter, better focused on what the actual problem is instead of fighting for valid values and of course, the compiler or interpreter can do the work for you.

Sounds nice, doesn't it? You can get this all with (almost) no cost attached, you just have to write down some more types. If your programming language contains feature like enumerations, you do not even have to make validity checks anymore, as the compiler can execute them.

Next

In the next post we will focus on the coding environment.

tags: #open-source #programming #software #tools #rust

This post was written during my trip through Iceland and published much latern than it was written.

While we heavily focused on the code-surrounding things in the last parts, we will return to focus on code-related things from here on.

This article discusses code verbosity and how it can improve your open source code and also your contributors experience a lot.

What is code verbosity

Code verbosity is mainly explicitness of code. For example, in Java you have to be more explicit when declaring a variable than in (recent) C++ or even Ruby:

String s = someFunctionCall(param); // Java

auto s = someFunctionCall(param); // C++

s = someFunctionCall param # Ruby

So code verbosity is how explicit you have to state certain things so that the compiler or interpreter understands your intention.

Because we do not always tell the compiler or interpreter what we want to do exactly and because we want to re-use functionality, we introduced abstractions. So abstractions are a way to make code less verbose, in some ways.

How to make code less verbose

Abstraction. It is as simple as this. You introduce abstraction to minimize repetition which leads to less verbose code. Of course, you cannot always make the code less verbose if the language does not allow it: in the above example we used the auto keyword for specifying the type in C++, which is nice, but not possible in Java. So in the borders of your languages abilities, you can make code less verbose.

If you do that right and the abstractions results in nice code, you know that you've done fine.

How much is too much

But there can also be too much abstraction which then yields unreadable code. Not unreadable as in clustered with stuff but just too abstract to grasp at first sight.

Abstraction can get too much. So make sure you introduce sensible abstractions, abstractions that can be combined nicely and of course one can step around the abstractions and use the core functionality, the not-abstracted things beneath.

As a sidenote: sometimes it makes sense to hide certain things completely or even introducing several layers of abstractions.

Next

This was a rather short one, I guess. The next article will be longer I hope, as it will be about typing.

tags: #open-source #programming #software #tools #rust

This post was written during my trip through Iceland and published much latern than it was written.

This is the last post which does not deal with code directly, I promise.

When it comes to open source hobby projects, contributions from others are often happily taken. But making the contribution process smooth for everyone does involve some precautions.

In this article I want to summarize how to make a contribution to a project as smooth as possible for all persons involved.

Public code and contributions

I wrote about this before and I want to shortly reiterate on it. “Open” as in open source (or even better: open contributions) is not black-white at all but there are several levels of grey in between, in my opinion.

The more open your code is, the better a contributor is able to contribute. Whether it be discussion, requests, bug reports, bug fixes or even feature implementation or general enhancement. On the other side, though, the more open your code is the less your contributors are “bound” (in a mentally way) to your project. It can happen (and it happened to me) that a contributor stops by for one pull request or issue and then you'll never hear of them. The better the contribution process is for them, the more likely they come back – and how relevant the project is to them, of course.

The contribution process in the rust community (for the Rust compiler itself) is awesome, from what I've heard. This, of course, enhances the “I will come back and give another issue a try” a lot. The contribution process of the nixpkgs project is slightly worse (but still rather good). Sometimes, nobody answers questions you might have for your pull request for several days or even weeks. This does not really make one eager to file another request.

Platforms

From what I think, github is the “most open” in the sense of “open contributions”. That's not only because of how github works, as other platforms work equally well (gitlab, gitea, gogs, bitbucket) but also because everyone is on github.

If you want to close down contributions a bit, you could host your own instances of gitea or gitlab – contributors can easily open an account for their contribution, though that slight hurdle will make the “casual code dumper” likely go away.

Even “more closed” would be a email-patch-workflow, git supports (and the kernel community uses successfully for years now). In this case, the code is often made available via a web interface like cgit or klaus.

Readme

A project should always contain a readme file in its root folder. The readme file is often the first thing a contributor will look at, not only but also because github renders and displays them.

Therefore, keeping your readme file up to date and filled with current information can be a good way to show your contributors (and users) what is going on in the projects code base. It should contain a short description what the project/code does and how it works (only from a users view – implementation details or why you implemented this in Haskell instead of JavaScript do not necessarily belong here). It should contain a few examples how to use the code or, if it is a library, how to call it. It also should contain build instructions (if necessary) or a pointer to a “BUILDING” file if the build process is long or complicated. At the end of the Readme file, a license statement (how the project is licensed) should be pasted. Not the entire license, but only a short note and a copyright note as well. It happens to be kind to do so.

Contributing File

Often, projects contain a Contributing file where guidelines (or even rules) are written down on how to contribute. It does not only contain statements on how code is submitted but also how issues are filed or requests should be made.

I think it is extremely important to have such an file available, especially if not hosting the code on a site like github, where it is obvious that code is submitted through pull requests and issues are submitted via the issue tracker.

The length of such an file should respect the size of the project itself. If the project contains 10KLOC, one should be able to read the contribution file in less than two minutes, preferably in less than one minute. It should state not only how code should be submitted, but also whether it should conform to some style guide (which itself can be outsourced to yet another file), how to behave in the community, how to write bug reports and also how to file issues (what information must be included).

Issue handling

Handling issues is clearly a way to improve the contributors experience. As soon as a contributor files an issue, she or he should be greeted and thanked for the issue. Take it this way: Someone just invested time to look at your project and cared enough to have a question, try it out or even found a bug. This is truly a cool thing and therefore they should be thanked for this, as soon as you have the time to do so. The Rust community even automated this, but I don't think this would be necessary for a small or medium sized project/community.

So be nice to every one. Nothing is worse than a maintainer that babbles about bad things or insults the contributor because of his or her ideas or ways an issue was proposed to be resolved. Don't ever do this. I've seen issues where the maintainer of the project started rambling about how bad things were (not the project itself but rather its dependencies or even things that had nothing to do with the project itself). I cannot believe that such projects will last long, let alone survive at all. These projects will die.

Also, your ramblings have nothing to do with the issue at hand and even if they do: be kind and humble will most likely be better in every way, right?

Next

In the next part we will finally go back and actually talk code.

What I want to discuss in the next article of this series is code verbosity. I want to make sure how DRY a code actually needs to be and how much abstraction is enough for the sake of understandability and cleanness of code.

tags: #open-source #programming #software #tools #rust

This post was written during my trip through Iceland and published much later than it was written.

Version Control is one important aspect when developing software as a whole, and especially when developing open source software.

Here are some thoughts about it.

Technology

First of all, technology wise it doesn't matter which version control system one uses. For the sake I'm using git here as an example VCS, though others might do as well.

One important thing, at least in my opinion, is that the VCS has some basic functionality. This is mainly that it can be used distributed and has a branching functionality (which are two things I like to believe go hand in hand).

So I do not care whether one uses git, mecurial, or anything else. Most important is that a (D)VCS is actually used.

Branching model

Branching is a method that came up before git was created, as bitkeeper had such functionality (as far as I can tell) before Linus Torvalds wrote git. It is only that git has revolutionised the way we do version control and brought branching to wider knowledge and use.

In my opinion it is really important how branching is done. There is not simply “the branching” but there are many ways to do branching and one might be better for certain use case than another. There are known models such as feature branching, the gitflow branching model and a rebase-merge workflow. I don't want to explain each of them because others have done so way better than I ever could.

What I want to tell is that branching is not only important, but as flexible as you might not even guess. This is not necessarily a good thing – I'll show you in a minute. In my opinion, branching and developing a branching model for a project is like developing an API. Once it is set up properly, it may serve as a communication rule for a project, putting developers on the same page about how certain things have to be handled. Having a protocol on how to work on things is a good thing. If implemented properly, branching can improve the work of everyone as it is one point less to think about.

The bad thing about flexibility of branching functionality is that it can be done wrong. It's as simply as that, but merging one branch into another when one is not supposed to do that, creates overhead which might not be reversible. This has happened to the best communities (for example the kernel community) but also happens in small communities, often due to too few knowledge of the tools at hand.

To summarize: If an open source project gets to a certain size (both code-wise and contributor/community-wise) a branching model should be implemented. If there are rules that contributors agree upon, it can improve working speed and therefore overall happiness in the community. Because developers like to bikeshed, it could also worsen happiness, of course. Though, it is better than no plan and chaos instead.

Hosting

I will not go into thoughts about hosting platforms in this article but rather on the how and why.

First of all, hosting the code somewhere with a way to show it in a web browser is a good way to improve the “open” part of open source code. Of course, tarball downloads and such suffice, but we are in the 21st century, so having a nice web interface is something one can expect.

Making the code browsable is often done via a VCS-specific web frontend, for example cgit for code version controlled with git. Therefore this web interfaces often also feature functionality to go back in time and view the history of one file. Maybe this is not needed often, but nevertheless helpful if needed.

I personally do not care about comments on code in my web interfaces or even ways to register users on the site, but of course some people like that. There are web interfaces that feature such things, for example for the git VCS there is gitea, gogs, gitlab, ... and many more. And of course there are the closed providers github, bitbucket and others...

Making code public and contributions easy

Hosting helps a lot with enabling contributions from strangers. No doubt, github makes contributions ridiculously easy.

I don't want to reiterate what others have said better and most people already know. What I want to point out here is that open source does not mean “open contributions”. One is completely free to reject all contributions one ones code base.

I really want to stress this. Open source does indeed mean that everyone is able to view the code, which also enables them to copy it (though redistribution might be limited or forbidden, as only free software allows you – by definition – to redistribute and alter code) but not necessarily that one is allowed or welcome to send in changes, feature requests or the like.

So if you want people to contribute to your code and suggest changes, features or report bugs, you should somehow give them the opportunity to do so. Depending on how “open” you want to be with your development you either should use a hosting platform (like github or bitbucket) or a slightly more “closed” variant, for example hosting your code on your own gitea instance. One step further you'd host your code on a site where people might be able to get it, maybe even with a “git clone”, though not send in pull requests, feature requests or open issues (for example a hosted git repository with cgit interface). Issues and bug reports could still be done via a mailinglist, if desired.

In fact, that last bit is what I consider for my own project imag.

SemVer, Change Management, Release Management

As soon as your code is out there, you have to think about change and release management. In my opinion, these are topics closely related to source code version control as VCS often offer functionality to do releases in one form or another and are clearly involved in the process of change management.

First of all, I'd like to suggest you read the SemVer specification. It is not that long but will help you understanding the next few paragraphs. So if you haven't read it already, go ahead and do so. Even if you don't apply SemVer to your projects it might open your eyes in one aspect or another.

But before we get into releases, we should first talk about change management, or better named for my points: Pull request management.

What I personally do with my PRs is, merge them when they're ready. This approach is easy and works, so far, pretty well. From time to time I have changes in my working branches (as stated before, I use feature branches) which might conflict with other peoples work. For the sake of contributor experience, I pause my PRs and wait until they are done with theirs. We will talk a lot about this in the next episode of this series, so I won't go into much detail. For now: This is a simple approach that works perfectly well so far for me and my (considerably small) open source projects.

But as soon as ones project grews bigger, that approach might not do the job anymore. If there are too many changes in a short amount of time which have to be agreed on and that have to be merged, it might be time to think about an alternative approach.

There are two ways I would tackle this problem. I never experienced it in the “real world”/in my projects, so the following is just a write down of my thoughts. Take a grain of salt from here on.

The first approach I can think of is to assign certain subsystems to certain people. If the amount of changes has become too big, one could assume that the codebase has also become tremendous. If that is the case, sub-maintainers can handle certain subsystems and the project leader can then periodically merge all changes together. This requires, of course, at least two people that are interested into the subject and willing to contribute maintaining efforts to the project.

If the latter is not the case or there are too few people around for this, one could consider a merge-window style approach, like known from Linus himself. Changes are pulled in every other week, for example, and the rest of the time, only bug fixes are merged into the project.

These two approaches might become handy some day if one is about to maintain a large code base alone (as in “as the only project owner”).

Now on to release management. In my opinion, releases should be done as soon as something works and from there on periodically. I myself made one mistake too often: Pull more things into one release than would have been good. For example the imag 0.2.0 release was over one year ago. 0.3.0 is almost ready, but not yet. I should've done more releases in between.

In my opinion, more releases with clear-cut edges are better than long release-cycles. As soon as there is a new feature for users – release. User-facing fixed – release. This might result in high numbers for versioning, but who cares?

This is where I want to throw SemVer in, to adjust my statement from the last paragraph with a “but”.

SemVer can be used to notify breaking user interfaces. This is a really good thing and therefore I think SemVer should be applied everywhere. SemVer also states that in the “ 0.y.z phase” everything is allowed to happen, also API breakage. This is where I want to adjust my statement from above. A lot of releases should be done in the 0.y.z phase, but also within that scope. As soon as a library or program hits the 1.0.0, changes should be applied carefully. One really does not want to end up with a program or library in version 127.0.0, right? That'd also decrease a users trust into the application as one can expect breakage with every new release.

So what I'd do and actually plan doing with my projects is releasing a number of zero-releases until I am confident that everything is all right and then go from there. For imag specifically I am not thinking about 1.0.0 because imag is far from ready, but for my other projects, especially toml-query, I think of 1.0.0 already.

Another point which popped into my head weeks after the initial draft of this article was: Do not plan the features of the next release with a release number! This might sound a bit odd, so let me explain. For example, you're planning three major features for the next release, which will be 0.15.0 then. And you're slowly getting to a point where the release becomes ready, you might need three more weeks to get it ready. Now, a contributor steps up and opens a pull request with another feature, which is already completely implemented, tested and also documented in the pull request. The contributor needs this feature as soon as possible in your code and you also think that it might be a great idea to release this as soon as possible. After you merged the request, you release the source – as 0.15.0, despite your three features are not yet completed.

Two things come to mind in this scenario: First, if two of your three features are already completed, they might show up in 0.15.0 but one feature has to be moved to the next release. If these two features are ready, but not tested, you might end up with a buggy release and have to release 0.15.1 soonish – more effort for you. If you do not merge your features into the master branch of your project, but you have a 0.15.0-prepare branch or something like that, you end up with a rather ugly merge-mess later on, as 0.15.0 is already released and you cannot just rename a public branch.

So how to handle this properly? I came to the conclusion that release-branches is the way to go here. In the scenario described above, you'd branch off of the release before, most certainly 0.14.x and create a new branch 0.15.0, where the pull request of the contributor would be merged than. As soon as the release is out, 0.15.0 will be tagged and merged back to the master branch.

What my point is here: you'd still need to rename your next milestone or rewrite your issues for the next release. That's why I would not plan “0.15.0”, but simply “the next release” – because you'll never know whether your planned things will actually be the next release or the the release after. So lessen the effort for yourself here!

Next

In the next article in this series I want to elaborate on how to make a contribution as pleasing as possible for the contributor. I guess I can talk a lot about that because I've contributed to a lot of projects already, including but not limited to linux, nixpkgs and nanoc.

tags: #open-source #programming #software #tools #rust

Inspired by the Call for Community Blogposts I want to summarize my experiences and thoughts on Rust in 2017 and what I am excited about for 2018.

Reflecting 2017

2017 was an amazing year for Rust. We got 8 releases of rust itself! We got basic procedural macros allowing custom derive (also known as “macros 1.1”) in the first release last year (1.15.0). This made serde 1.0 possible, if I'm not mistaken? We got 103 stabilized APIs in 2017. This is incredible! The improvements of compiletime and also the tooling got so much better. I mean, it was awesome before. But now it is even better!

On a personal side I got a lot better at programming Rust. I wrote about 37800 lines of rust code in my main project imag and 17380 lines in other crates (authored and contributed, according to a bit git-fooing around). Is that a lot? I don't know.

Hopes for 2018

Now lets talk about 2018. This year will be amazing, I am sure.

Language features

I am really excited about the “impl trait” thing. Beeing able to return an trait from a function will reduce the imag codebase so much, for example. We no longer need to define our own iterator helper types but can simply return Iterator<Item = Whatever>!

I have no other hopes for the language itself, because what we have right now is really amazing and I honestly cannot think of ways it could be improved.

Ecosystem needs / Tooling enhancements

I'm still a bit concerned about cargo functionality for building workspace projects. From what I see, building two different crates in one workspace which share dependencies rebuilds the dependencies. This is not as intended, I guess, but that's what I see. I did not dive deep into this, so I might be wrong, though.

What I am thinking about for several weeks now is a cargo/rust tool for calculating code metrics. I think of things like documentation/code ratio, average function length, simple things... but also about cohesion and coupling metrics and other inter-module/inter-crate metrics.

Also, I tried to set up the rust language server for vim on my workstation and failed hard. I guess this is a packaging problem with my distro (NixOS), though. Either way, installing the rls with a stable toolchain would be nice!

Crates I am still missing / should be improved

There are some crates I would love to have which do not exist yet.

  • A (high level) email crate. There is the email crate, but it is mainly unstable and does not even have a 0.1.0 yet. There's also lettre_email, which is in 0.7.0, but it doesn't support parsing of emails.
  • I really hope rust-vobject (which is one of the crates I contributed to in 2017) will improve even more and be the defacto-standard crate for handling vcard and icalendar data.
  • I follow the development of Cursive and from what I see it is awesome. I really hope people start writing high-level objects for cursive (like a file explorer, a form builder, a text editor like thing, a tab helper and so on) so I have to do less work when implementing a TUI for imag. (To be fair, there are already some crates available).
  • I hope there will be some awesome crates for handling multi-media files and reading/writing their metadata. Especially audio formats and video formats are important to me with imag.
  • Rust bindings for pass would be awesome.
  • Markdown (and other formats, like asciidoc, restructured text, textile and maybe even bbcode) parsers and renderers should be written/improved
  • A API for IPFS or maybe even a protocol implementation
  • Qt bindings (yeah, I have high hopes for 2018)

There are possibly thousands more... But I won't list them all.

tags: #open-source #programming #software #rust

This post was written during my trip through Iceland and published much latern than it was written.

In this and also maybe in the next few articles we will focus on rather code-related things than on direct code properties. I hope that's okay.

Planning of an application or library is not easy, not at all. But how much planning do we actually do before writing code? And should we do more?

My thoughts on the subject.

What we've learned

One that has studied computer science should know at least some UML types like class diagrams, flow charts, module plans and use case diagrams. They are used in (let's call it) “normal” software development and in the professional world out there.

But when we are developing open source software for our own needs and maybe for our friends, we do that often in our chambers at home. Class diagrams are often not being developed and I can say that I never saw a hobby programmer draw a use case diagram before writing the code of the application.

Why we don't use it

Why is that? Well, because open source software is often done as a hobby type of thing, there is often no need for planning ahead. A hobbyist is able to hold use case, simple class diagrams and flow charts “in his mind” because he has great knowledge of the domain.

In fact. as he defines the domain entirety, he is both stakeholder, project leader, software architect, programmer, tester and marketing guy at the same time. He knows what problems are about to be solved and therefore can adjust every aspect of the application to the needs required.

This holds true for small and medium sized applications or code bases, where the problems is of certain complexity but not too big. Basically one could say that every aspect of the domain has to fit into one head without much effort, in the open-source-programming-at-home-world. With a bit of training, I believe, one can even get to a point where only a few aspects of the domain have to be in a persons mind to be able to work on a solution

But there is certainly a point where the effort needed to solve a specific problem explodes. One can still write software to solve the problem at hand, but not in reasonable time.

So why don't we hobby programmers do not use planning tools like we've learned in university? Why don't we use diagrams to make things clearer, better documented, even before the real programming starts? The answer is quite simple: because it annoys the hell out of us. We don't like to plan ahead. We don't like to adjust plans as soon as we find out that changing a small aspect of our library could be changed to gain more flexibility and overall goodness. We don't like to check our plans before writing down the next module until it works.

Coding is fun, planning is not.

But should we use these things

In my opinion, this is foolish. We really should use the things we learned in university to plan out software and of course also to document it. It would be such a huge improvement of everything to simply think a bit more about it before actually implementing it!

How we do it

What we do and why we do not use tools to plan ahead is explained with one sentence: We program from the user interface to the implementation, because the other way round is to complicated. Or, with other words: We program top-down because bottom-up needs planning and therefore not that easy.

Of course, I'm speaking about the average case. I've programmed bottom-up before but, for me, it seems much more error prone than top-down does, especially without a plan.

Also, I do not say that top-down is not error prone. Not at all. When writing an API without an actual implementation in mind, one easily results in sacrificing cleanness and speed at some points to keep the API nice, which is not always a good idea. So top-down is only good as long as we get it right.

Tooling

Tooling is one big problem in this context. We do not have a toolchain for planning just yet. At least I do not have one that I would like to use. Because we are really good at controlling (versioning, moving around, managing) our source code (for example with git, and to some extend github), we also want to be able to do this with charts diagrams. But we also want the niceness of SVG-rendered graphics. We don't want to play around with layout all day long, but use tools to simply get the job done.

And there are no such tools available.

Sure, one can use graphviz to design such things, but then again we do not have a nice overview on what's going on while editing our work. One could use ascii-art to draw all those things, but hey... ascii-art. We are better than that, aren't we? We could render the ascii-art into SVG... though the tooling there is not yet as good as it should be. And even if it would be, version controlling these things with git is (I fail to believe otherwise) painful.

Conclusion

Well, I can only conclude the obvious here. We need better tooling for the open source programming community to do their planning, if they need to. Clearly, one does not always have to (or want to) plan things before trying out. But when one does, the tooling should be there and be useful and help with the process.

Next

In the next episode we will talk about version control of open source software projects. I'm not going into details about git or other systems used, but rather on the style how they should be used so everyone is pleased with it. This might be strongly biased, but hey, isn't this whole article series biased?

tags: #open-source #programming #software #tools #rust

This post was written during my trip through Iceland and published much latern than it was written.

What is a nice and gold API. How is “nice” defined when it comes to library interfaces? That's a question I want to discuss in this post and also, how you can create a nice API in your open source library without studying a topic like software architecture or similar.

Definition of a “nice”/” easy to use” API

But first, we have to define what makes an API good. And that's not that easy because this topic is very biased.

For me, a good API is one where I can get the job done without thinking much about it. That means that there shouldn't be that much setup code involved in my code just to use the library. So no Factory hell if the only thing I want to have is the current time, for example. This also means that the API has to be decent high level, but without losing the ability to do fine-grained work if necessary. So for the most part, low level (for example implementation details) things are not interesting for me. But when I want to bit-fiddle around with the library, it should let me.

If a builder, factory or some other mechanism is necessary to produce objects in some way, the library should make clear (documentation wise but also code wise) why it is needed. There's no point in making the user call the tenth factory instantiation if it is not necessary and also it makes the users codebase blow up in size and complexity.

The naming of things in the library should be good, appropriate and, for the most part, be consistent. If a function on an object which returns the string representation of that objectbis named “to_string” it should be named that way for all types from that library, not only some parts.

Statelessness

Calling functions of your API should always result in the same values for the same arguments. That does not mean that your API should be pure in a functional programming meaning, but rather that the actions executed when calling a function should not result in some library-internal variables to be set, changed or unset. This is easily achievable by letting the user of the API have an object that holds the state, and functions of your API work based on that value. In short: your library should not have global variables.

This simple design pattern already results in easy to use APIs and a nice user experience.

Error exposure

Good libraries don't hide errors. Indeed, it is even better if errors are exposed to the user as much as possible. Because the user of the library knows best when and how to handle errors, even from your library.

I'm also a big fan of lots of error cases. The more error cases (the better a user of a library can distinguish between different errors) the better. This way, you let the user decide where she doesn't distinguish between two almost-equal error cases and where it is better to handle them independently. If your library does not give that opportunity, the user has to make ugly Spaghetti-code handling things to be able to tell what is going on. Of course, these things have to be documented properly.

Another thing that can come in handy is when your error types or your library exposes functionality to translate error types into text which can be shown to a user of your library. Nothing is worse (from a users point of view) than a “CallOnInconsistenStateObjectBuilderFactory on line 2832” error message shown in an user-facing interface (and trust me, I've seen such things already).

Completeness

Nothing is worse than an API that is not complete. I mean, don't get me wrong – sometimes one does not think of all cases a library could be used for – and that's completely okay. But some things are too obvious for being left out. For example, if you provide functions to transform your time object from local time into GMT, why wouldn't you provide functions for converting it into UTC or EST? These also matter!

Also cleanup routines. In some languages it is necessary to include cleanup routines for your objects. If your library exposes alloc_vacation_location_obj() it should also provide free_vacation_location_obj()! Sure, a user could use free(), but it is not nice API-wise. Even if your function does nothing more than call to free(), it is better to provide a function (and if you want to include some more cleanup in your function later on, in a new version of your library, a user does not have to think about it that much when upgrading their dependencies).

Consistency

We had the naming game already, but it always comes back to us, right? Consistent naming is one of the most important things in an API. If allocating worked with functions prefixed with new_ all the time, it shouldn't be done with alloc_ this time. Also not in later versions if your library. Not even in a major version bump.

Even more important than naming is behaviour. A function that is named with some alloc prefix should only allocate, never initialize or do other fancy stuff (debugging output excluded here, if necessary).

Next

In the next episode we will talk about how one can plan an application.

tags: #open-source #programming #software #tools #rust