Changelog Management

October 2, 2022

I've been writing cargo-changelog lately and already published the first version (0.1.0) on crates.io.

Here I want to write down some thoughts on why I wrote this tool and what assumptions it makes. This should of course not serve as documentation of the tool, but simply as a collection of thoughts that I can refer to.

Where

Changelog management is hard. Not because it is particularly difficult to do, but because nobody really wants to do it in the first place. Especially because there's no established “place” where it should be done.

Some tools want the programmer to write commits which can serve as changelogs. I wrote about that before. It puts burden on the programmer who does not want to concern themselves with whether a change is user-facing or not and whether it does impact the user at all. That's not their job after all! Not in an open source setting and especially not in a commercial environment. They're hired for working on the software and that's all they should do!

In an open-source world, the programmer of a feature may even contribute changelog entries, because they know that the change will have a certain impact on users when released. But the keyword in the prior sentence is “may”. They are not required to do so and should never be. Opensource projects suffer from having to few contributors. Of course, there are big open source projects out there, like kubernetes, tokio, django, Rust, TensorFlow or, of course, the Linux Kernel. These projects do not have that issue, but I feel comfortable in assuming that these are the Top-1%. Most Opensource projects have one or two contributors or, if lucky, are seeing maybe ten to fifteen regular contributors. If an such a project loses only one contributor, that has significant impact on the overall project. Thus, making contributors happy is somewhat of a key concern. Putting them responsible for adding changelog entries to their changes may not be the best way of making them happy.

Thus, I think, changelogs should be managed by the maintainer or someone in the project that wants to dedicate themselves to that task. The contributors should only do what they do best: Produce code and deliver features, fixing bugs, etc.

Under that presumption, putting changelogs within a commit is not a particularly good idea. It does not matter whether we're talking about commit formats like conventional commits here or about git-trailers for categorizing commits. After all, if a contributor categorizes the commit in the wrong way, they would need to rewrite the commit, even though the code they changed may be optimal. That's a serious hassle.

That leaves us only with producing the changelog entry outside of the actual commits that introduce the change.

The idea may then be to add the changelog entry in a dedicated commit, but still within the pull request that introduces the relevant change. That sounds good at first, but quickly falls apart because of a simple issue: Merging this may not be possible. The changelog entry that lands in a CHANGELOG.md file normally gets appended in some form or another. Whether that is a simple append to the section for the upcoming version of the software, or to a sub-section “Bugfixes”/“Features”/... does not matter, it is still an append. If someone else produced a change to that same section, we quickly run into merge conflicts. Needing a pull request to be rebased just because the changelog entry does not merge is a serious slow-down in progress for the whole project. That should never happen!

After establishing the last point, we see that producing the changelog outside of the commits that introduce a change as well as outside of the pull request that introduces the change does have a number of benefits to the overall pace of the project. Also, having someone dedicated to the issue of producing a changelog instead of burdening the programmers also has a benefit that may be beneficial to the whole project not only as in pace but also as in developer happiness.

The above points do not mean that a programmer who feels dedicated shouldn't be able to produce a changelog for their contribution! Of course they should be enabled to produce that changelog! But they should not have to concern themselves with mergability!

Also, producing changelogs should not slow down the project pace. After all, adding changelogs to a project is still a contribution. It should be as easy as producing code. It should not suffer from merge conflicts if two or more contributors add a changelog for different changes.

How

With all that in mind, I came up with a simple scheme. It turns out that other projects exist that follow a similar scheme – so I cannot take any credit for that. I still opted to start cargo-changelog because these already existing tools do of course not integrate with cargo, as they were written in other ecosystems.

So the general idea here is that we do not produce one large CHANGELOG.md file, but we record changes in individual files, called “fragments”. These fragments get put into a special place in the repository: .changelogs/unreleased/. The filename for each fragment is produces simply from a timestamp. That ensures that adding two fragments from two different pull requests will most certainly not result in a merge conflict.

A fragment contains two sections: A section with structured data and free-form text. That structured data is encoded in YAML or TOML (although normally these tools opt for YAML and cargo-changelog does so as well).

I thought long and hard about what structured data may be recorded here. It turned out: I don't know and of course I shouldn't decide this. So what I did was implement a scheme where the user can define what structured data they want to record! Each project can, in the .changelog.toml file, which serves as configuration file for cargo-changelog, define what structured data they want to record, whether a data entry is optional or whether it has a default value. When generating a new fragment, cargo-changelog can either present the user with an interactive questionnaire to fill that data, or(/and) open the users $EDITOR where they can edit that structured-data header themselves.

Structured data may be the pull-request number that introduced the particular change, a classification of that change (“Bugfix”/“Feature”/“Misc”/... whatever the project defines in the .changelog.toml configuration file) or, if desired, a “Short description” of the change.

The free-form text of the fragment can be used to document that change in a human-readable way. Currently, no format is enforced here, so whether the user uses Markdown or reStructured Text or something totally different is entirely up to the user (although cargo-changelog generates .md files for the fragments).

When a release comes up

As soon as the software is about to be released, the “unreleased” fragments should be consolidated. cargo-changelog helps with that by providing a command that moves all fragments from .changelogs/unreleased/* to .changelogs/x.y.z/ (where x.y.z is of course the next release version, either by asking cargo or by letting the user specify it).

One crucial idea here was that the release will be done on a dedicated release-branch. Of course the tool does not enforce or demand this in any way, but it gives the option of doing that without running into issues later down the road.

So if the release branch gets branched off of the master branch, the person dedicated for making the release would issue the cargo-changelog command for consolidating the unreleased fragments and then commit the moved files. After that, they would issue the cargo-changelog command for generating the CHANGELOG.md file. That file would always be generated and never touched manually. There's no need in doing so: changing changelog entries after the fact (for example if a typo was found) would happen in the fragment files.

Of course, the CHANGELOG.md file should also appear on the master branch of the project! Cherry-picking the commits that consolidated the unreleased fragments as well as the one that generated the changelog file does simply work, even if master progressed with new changelog fragments!

Changelog generation

In the previous section I wrote that the CHANGELOG.md file would be generated and would never be edited manually. Still, the user may want to add some custom text at the end of the changelog file, maybe they would like to use a custom ordering of their changes – Maybe they want to list bugfixes first and features second? Or they want only to have the short description of the individual changelog fragment to be displayed and the long-form text should reside in a <details>-enclosed part, so that when rendering the file a user can get a quick overview!

That's why CHANGELOG.md files are generated with a template file. That template resides in .changelogs/template.md (that path, as everything else with cargo-changelog, can be configured). That template file uses Handlebars templating and can be tweaked as required. In the current version of cargo-changelog, there are some minimal helpers installed with the templating engine to sort the released versions, group changes by “type” and some minimal text handling. More will follow, of course.

Metadata crawling

Another feature that cargo-changelog has is metadata crawling. One may want to fill header fields by issuing some command and using that command output as a value for a header field. cargo-changelog can call arbitrary commands for doing exactly that. Each header field can have a “crawler” configured, for issuing commands. These commands may even be other interactive programs like a script that uses skim (or its more popular counterpart fzf) for interacting with the user.

To sum up

To sum up, these are my thoughts and notes on changelog management with cargo-changelog. Of course, most of this is tailored towards opensource projects (and – if someone noticed – also towards an always-green-master strategy. I may write a blog article about that as well).

cargo-changelog is in 0.1.0 and certainly not feature complete yet. It is a first rough implementation of my ideas and it seems to work great so far, although it is not battle tested at all! I am eager to try it out in the near future and extend it and improve it as need be. One can see the tool in action in the history of the repository of the tool itself!

And as always: contributions are welcome!