programming

Why we need distributed issue tracking

October 20, 2017

This post was written during my trip through Iceland and published much latern than it was written.

When writing my last entry, I argued that we need decentralized issue tracking.

Here's why.

Why these things must be decentralized

Issue tracking, bug tracking and of course pull request tracking (which could all be named “issue tracking”, btw) must, in my opinion, be decentralized. That's not only because I think offline-first should be the way to go, even today in our always-connected and always-on(line) world. It's also because of redundancy and failure safety. Linus Torvalds himself once said:

I don't do backups, I publish and let the world mirror it

(paraphrased)

And that's true. We should not need to do backups, we should share all data, in my opinion. Even absolutely personal data. Yes, you read that right, me, the Facebook/Google/Twitter/Whatsapp/whatever free living guy tells you that all data need to be shared. But I also tell you: Never ever unencrypted! And I do not mean transport encryption, I mean real encryption. Unbreakable is not possible, but at least theoretically-unbreakable for at least 250 years should do. If the data is shared in a decentralized way, like IPFS or Filecoin try to do, we (almost) can be sure that if our hard drive failes, we don't lose the data. And of course, you can still do backups.

Now let's get back to topic. If we decentralize issue tracking, we can make sure that issues are around somewhere. If github closes down tomorrow, thousands, if not millions, open source projects lose their issues. And that's not only current issues, but also history of issue tracking, which means also data how a bug was closed, how a bug should be closed, what features are implemented why or how and these things. If your self-hosted solution loses data, like gitlab did not long ago on their gitlab.com hosted instance, data is gone forever. If we decentralize these things, more instances have to fail to bring the the whole project down.

There's actually a law of some sort about these things, named Amdahl's law: The more instances a distributed system has, the more likely it is that one instance is dead right now, but at the same time, the less likely it is that the whole system is dead. And this is not linear likelihood, it is exponential. That means that with 10, 15 or 20 instances you can be sure that your stuff is alive somewhere if your instance fails.

Now think of projects with many contributors. Maybe not as big as the kernel, which has an exceptionally big community. Think of communities like the neovim one. The tmux project. The GNU Hurd (have you Hurd about that one?) or such projects. If in one of these projects 30 or 40 developers are actively maintaining things, their repositories will never die. And if the repository holds the issue history as well, we get backup safety there for free. How awesome is that?

I guess I made my point.

tags: #open-source #programming #software #tools #git #github

How to improve your open source code (3) – Modularization

October 15, 2017

This post was written during my trip through Iceland. It is part of a series on how to improve ones open-source code. Topics (will) contain programming style, habits, project planning, management and everything that relates to these topics. Suggestions welcome.

Please note that this is totally biased and might not represent the ideas of the broad community.

When it comes to hobby projects in the open source world, one often works alone on the codebase. There is no community around your tool or library that is actively pushing towards a nice codebase. Anyways, you can have one. And what matters in this regard (not only but also) is a nice modularization of your codebase.

Why modularization

It is not always beneficial to modularize your code. But if your program or library hits a certain size, it can be. Modularization helps not only with readability, but also with separation of concerns, which is an important topic. If you do not separate concerns, you end up with spaghetti code, code duplication and this leads to doubled effort, which is essentially more work for the same result. And we all know that nobody wants that, right? In the end we're all lazy hackers. Also, if you write code twice, trice or maybe even more often, you end up with more bugs. And of course you also don't want that.

SoC

Now that the “why” is cleared, the “how” is the next step to think about.

We talked about the separation of concerns in the last section already. And that's what it boils down to: You should separate concerns. Now, what is a concern? An example would be logging. Fairly, you should use a decent library for the logging in your code, but that library might need some initialization routines called and some configuration be passed on. Maybe you even need to wrap some things to be able to have nice logging calls in your domain logic – you should clearly modularize the logging related code and move the vast majority of it out of your main code base. Another concern would be file IO. Most of the time when you doing File IO, you're duplicating things. You don't need to catch IO errors in every other function of your code – wrap these things and move them to ankther helper function. Maybe your File IO isnmore complex and you need to be able to read and write multiple files with similar structure all the time – a good idea would be to wrap these things in a module that handles these things only. The concern of File IO is now encapsulated in a module, errors are handled there and users of that module know what they're working with – An abstraction which either works or fails in a way so that they don't have to re-write error handling in every other function but can simply forward errors or fail in a defined way.

If you think about that for a moment, you'll notice that modularization is a multilevel thing. Once you have nicely defined modules for several concerns, you might build new and more high-level modules out of them. And that's also exactly what frameworks do most of the time: They combine libraries (which might be already at a certain level of abstraction) into a new, even more highlevel, library for another concern. For example, a web framework combines a database library, templating library, middleeare library, logging library, authentication and authorization libraries into one big library, which is then – inconsistently – called a framework.

CCC

No, I'm not talking about the Chaos Computer Club here, but about Cross Cutting Concerns. A cross cutting concern is one that is used throughout your entire codebase. For example logging. You want logging calls in your data access layer, your business logic and your user interface logic.

So logging is a functionality all modules of your software want to have and need. But because it is cross cutting, it is not possible to insert it as a layer in your stack of modules. So it is really important that your cross cutting concerns are as loosely coupled to all other modules as possible. In case of logging, this is particularly easy, because after the general setup of the logging functionality, which is done exactly once, you have a really small interface to that module. Most of the time, not more than four or five functions: one call for each level of verbosity (for example: debug, info, warn, error).

Naming things...

In know, I know... this bikeshedding topic again. I don't want to elaborate to extensively on this subject, though a few thoughts must be said.

First of all, module names should be like function names (which I explained in the last episode) – short, to the point and describing what a function, in our case module, does. And that's still it. A module should be named after its domain. If the module does, for example, contain only some datatypes for your library, or only algorithms to calculate things, it should be named exactly that: “types” or “algo”.

A second thing I would clearly say is that module names should be a one-word-game. Module names should hardly contain more than one word – why would they? A domain is one word, as showed above.

Multiple libraries

What I really like to do is to have several libraries for one application, if the codebase hits a certain complexity. I like to take imag as an example. Imag is a collection of libraries which build on a core library and offer abstraction over it. This is a great separation of concerns and functionality. So a user of the imag source can use some libraries to get a certain functionality and leave out others if their functionality is not needed. That's a great separation of concerns with loose coupling. But when it comes to this approach, another topic gets important as well, and we'll talk about that in the next episode.

The naming restrictions I stated above apply to library names which are solely for separation inside of an applications, too, though I would relax the one-word-name rule a bit here.

In the next Episode of this blog series we will talk about another important subject which is related to modularization: API design.

tags: #open-source #programming #software

CI first vs. Review first

October 2, 2017

This post was written during my trip through Iceland and published much latern than it was written.

Almost all toolchains out there do a CI-first approach. There are clearly benefits for that, but maybe that's not what one wants with their self-hosted solution for their OSS Projects?

Here I try to summarize some thoughts.

CI first

CI first is convenient, of course. Take github and Travis as an example. If a PR fails to build, one has not even to review it. If a change makes some tests fail, either the test suite has to be adapted or the code has to be fixed to get the test working again. But as long as things fail, a maintainer does not necessarily have to have a look.

The disadvantage is, of course, that resources need to be there to compile and test the code all the time. But that's not that much of an issue, because hardware and energy is cheap and developer time is not.

Review first

Review keeps the number of compile- and test-runs low as only code gets tested which is basically agreed upon. Though, it increases the effort a maintainer has to invest into the project.

Review first is basically cheap. If you think of an OSS hobby project, that might be a good idea, especially if your limited funding keeps you from renting or buying good hardware where running hundreds or even thousands of compile jobs per month can be done at decent speed.

What I'd do

I'm thinking of this subject in the context of moving away from github with one of my projects (imag, of course. Because of my limited resources (the website and repository are hosted on Uberspace, which is perfect for that), I cannot run a lot of CI jobs. I don't even known whether running CI jobs on this host is allowed at all. If not, I'd probably rent a server somewhere and if that is the case, I'd do CI-first and integrate that into the Uberspace-hosted site. That way I'd even be able to run more things I would like to run on a server for my personal needs. But if CI is allowed on Uberspace (I really have to ask them), I'd probably go for Review-first and invest the money I save into my Uberspace account.

tags: #open-source #programming #software #tools #git #github

All we need for a self-hosting distributed open source programming toolchain

September 30, 2017

This post was written during my trip through Iceland and published much latern than it was written.

From my last article on whether to move away from github with imag , you saw that I'm heavily thinking about this subject. In this post I want to summarize what I think we need for a completely self-hosted open source programming toolchain.

Issue tracking

Do we actually need this? For example, does the kernel do these things? Bug tracking – yes. But Issue tracking as in planning what to implement next? Maybe the companies that contribute to the kernel, internally, but not the kernel community as a whole (AFAIK).

Of course the kernel is a bad example in this case, because of its size, the size of the community around it and all these things. And other smaller projects use issue tracking for planning, for example the nixos community (which is still fairly big) or the Archlinux community (though I'm not sure whether they are doing these things only over mailinglists or via a dedicated forum at archlinux.org.

Bug tracking

Bug tracking should be done on a per-bug basis. I think this is a very particular problem that can be easily solved with a mailing list. As soon as a bug is found, it is posted to the mailing list and discussion and patches are added to the list thread until the issue is solved.

Pull request tracking

With github, a contributor automatically has an web-accessible repository. But for the most part it is sufficient if the patches are send via an email-patch workflow, which is how many great projects work. Having web-accessible repositories available is just a convenience github introduced and now everybody expects.

I think pull requests (or rather patchsets) are tracked no matter how they are submitted. If you open a PR on github, patches are equally good tracked as with mailing lists. Indeed I even think that mailing lists are way better for tracking and discussion, as one can start a discussion on each individual patch. That's not really possible with github. Also, the tree-shape one can get into when discussing a patch is a major point where mailing lists are way better than github.

CI

Continuous Integration is a thing where solutions like gitlab or github shine. They easily integrate with repositories, are for free and result in better and tested code (normally). I do not know of any bigger open source project that does not use some form of CI. Even the kernel is tested (though not by the kernel community directly but rather companies like Intel or Redhat, as far as I know).

A CI solution, though, is rather simple to implement (but I'm sure it is not easy to get it right). Read my expectations below.

How I would like it

Issue and bug tracking should be based on plain text, which means that one should be able to integrate a mailing list into the bug tracking system. Fortunately, there is such an effort named git-dit but it is not usable yet. Well, it is useable, but has neither email integration nor a web interface (for viewing). This is, of course, unfortunate. Also, there is no way to import existing issues from (for example) github. And that's important, of course.

For pull request/patch management, there's patchworks. I've never worked with it, but as far as I can see it works nicely and could be used. But I would prefer to use git-dit for this, too.

I would love to have an CI tool that works on a git-push-based model. For example you install a post-receive hook in your repository on your server, and as soon as there is a new push, the hook verifies some things and then starts to build the project from a script, which preferably lives in the repository itself. One step further, the tool would create a RAM-disk, clone the just pushed branch into it (so we have a fresh clone) and builds things there. Even one step further, the tool would create a new container (think of systemd-nspawn) and trigger the build there. That would ensure that the build does not depend on some global system state.

This, of course, has also some security implications. That's why I would only build branches where the newest (the latest) commit is signed with a certain GPG key. It's an really easy thing to do it and because of GPG and git itself, one can be sure that only certain people can trigger a build (which is only execution of a shell script, so you see that this has some implications). Another idea would be to rely on gitolite, which has ssh authentication. This would be even easier, as no validation would be necessary on our side.

The results of the build should be mailed to the author/commiter of the build commit.

And yes, now that I wrote these things down I see that we have such an tool already: drone.

That's it!

That's actually it. We don't need more than that for a toolchain for developing open source software with self hosted solutions. Decentralized issue/PR tracking, a decent CI toolchain, git and here we go.

tags: #open-source #programming #software #tools #git #github

How to improve your open source code (2) – Function naming and size

September 20, 2017

Please note that this is totally biased and might not represent the ideas of the broad community.

We're slowly working ourselves up from if statements in the last episode of this article series to functions in this article. I'm using the term “function” in this article, but you're welcome to interpret this word as you like – whether you're from the C world, a C++ hacker, you're lining up your types in the wonderful lands of Haskelldonia or you like snakes and play with Python all day long doesn't really matter. Functions can be Functions, Procedures, Methods or whatever you name these things in your language of choice.

So when thinking about functions, what do we have to think of first? Well, yes...

Naming things

Computer scientists and Nerds like to bikeshed this to death. We can talk about function naming all day long, can't we? I do not like this topic at all because most people cannot keep their heads calm when talking about this. So I simply list what I think should be in a general guide on function naming, and it'll help you exactly nothing:

Short, but not too short
Expressive what the function does, but not too expressive
To the point
Not interpretable
Should only contain the good-case in the name
Should not contain “not” or “or” and “and” – except when it should

Shrtpls

Function names should be short. If you have a look at the C “string” header, you'll find a function names “strlen”. This name is truly wonderful. It is short, to the point.

Your function names shouldn't be too short. So single-character names are a no-no! Even two characters are most certainly too short. One-word-names are a good way to go for simple functions. So a function “sum”, “ceil” or “colour_in” are fine.

Expressiveness

A name should always express what the function does. The examples from the last section are a good example for this. Bad examples are “enhance”, “transform” or “turn_upside_down”.

If a function name has to be a bit longer to express what it, the function, actually does, that's okay I guess.

To the point / Not interpretable

A reader of your code should understand what your function does when reading the name alone, maybe including the types involved, if your language offers types. But not only that, she should also be able to tell you, the writer of the code, what the function does without you correcting her.

I think it is always a good idea to think “from a third persons perspective”. If you think someone else can tell you what your code does, you can consider it good enough. Not perfect – every code can be improved. But good enough.

The rest

I want to summarize the rest of the points from above in this section. A good heading for this section might be “good practices for function naming”, but that also might not fit as nicely as it should.

The thing is, if your function actually implements business logic (you might not have a “business” in your open source codebase, but at least a domain), it really should not contain terms that are boolean operators. For example a function should never be names “does_not_contain_errors”. Not only is this name way to long, also including boolean logic in function names makes it harder to actually using them in boolean expressions because you have to wrap your head around these things all the time. A better name would be “contains_errors” - You can use a negation operator on this after calling!

In the end it's all about size

Out there in the primal world bigger is better. Bigger muscles, bigger knifes, guns or tanks, even bigger cities, cars, houses. But in the world of programming, things are reversed. Small things matter!

So your functions should be small. As small as possible, actually. Todays compilers can inline and optimize the hell out of your code – and if you're one of the scripting language enthusiasts out there – does speed actually matter in your domain? Steve Klabnik once said that, in the Rails world, people tell each other that yes, Ruby is a slow language, but network and the database are your bottleneck, so nobody cares.

Also, think of the advantages of short functions: people can more easily understand what you're doing, testability gets improved because you can apply tests on a more fine-grained level and documentation might also get improved because you have to document more functions (that's another topic we'll discuss in a different article as well).

I really don't want to write down any line numbers or approximations on how long a function actually should be. That's not only because I don't want to burden myself with this, but also because I cannot give any advice without knowing what the function should do – if you have a complex function that has to do a lot of things in preparation for the domain logic which cannot be outsourced (for example acquiring some locks), it might be 100 lines or more (also depending on your language). If you're doing really simple things for example building a commandline interface or setup work, it might be even 1000 lines or more. I really cannot tell you how much is too much.

But in general, shorter is better when it comes to functions.

Scoping

Some programming languages offer scopes. For example C, C++ or Rust, but also Java and even Ruby.

Scopes are awesome. Variables get cleaned up at the end of the scope, You can, of course, use them for if statements or loops. But scopes are also good for structuring your code!

In the above section I wrote that some functions might need to do some setup work or preparation before actually implementing their logic. For example, if a function needs to acquire some locks or allocate some resources before doing the actual work.

In some cases it is possible to separate the domain logic of the functions by simply applying some scopes. Let me write down an example in pseudo-rustic code:

fn calculatesomevalue() { // do // some // setup

{ // domain // logic }

// cleanup }

I guess that makes my point pretty clear.

Up next we will talk about modularization of code and how you can structure your programs by seperating things into classes, modules, namespaces etc etc.

tags: #open-source #programming #software

Why I think about closing contributions to imag

September 15, 2017

This post was published on both my personal website and imag-pim.org.

I'm thinking of closing contributions to imag since about two months. Here I want to explain why I think about this step and why I am tending into the direction of a “yes, let's do that”.

github is awesome

First of all: github is awesome. It gives you absolutely great tools to build a repository and finally also an open source community around your codebase. It works flawlessly, I did never experience any issues with pull request tracking, issue tracking, milestone management, merging, external tool integration (in my case and in the case of imag only Travis CI) or any other tool github offers. It just works which is awesome.

But. There's always a but. Github has issues as well. From time to time there are outages, I wrote about them before. Yet, I came to the conclusion that github does really really well for the time being. So the outages at github are not the point why I am thinking of moving imag away from github.

Current state of imag

It is the state of imag. Don't get me wrong, imag is awesome and gets better every day. Either way, it is still not in a state where I would use it in production. And I'm developing it for almost two years now. That's a really long time frame for an open source project that is, in majority, only developed by one person. Sure, there are a few hundred commits from other, but right now (the last time I checked the numbers) more than 95% of the commits and the code were written by me.

Imag really should get into a state where I would use it myself before making it accessible (contribution wise) to the public, in my opinion. Developing it more “closed” seems like a good idea for me to get it into shape, therefore.

Closing down

What do I mean by “closing development”, though? I do not intend to make imag closed source or hiding the code from the public, that's for sure. What I mean by closing development is that I would move development off of github and do it only on my own site imag-pim.org. The code will be openly accessible via the cgit web interface, still. Even contributions will be possible, via patch mails or, if a contributor wants to, via a git repository on the site. Just the entry gets a bit harder, which – I like to believe – keeps away casual contributors and only attracts long-term contributors.

The disadvantages

Of course I'm losing the power of the open source community at github. Is this a good thing or a bad thing? I honestly don't know. On the one hand it would lessen the burden on my shoulders with community management (which is fairly not much right now), issue management and pull request management. On the other hand I would lose tools like travis-ci and others, which work flawlessly and are a real improvement for the development process.

The conclusion

I don't really have one. If there would be a way to include Travis into a self-hosted repository as well as some possibility for issue management (git-dit isn't ready in this regard, yet, because one cannot extract issues from github just yet), I would switch immediately. But it isn't. And that's keeping me away from moving off of github (vendor lock in at its finest, right?).

I guess I will experiment with a dedicated issue repository with git-dit and check how the cgit web interface works with it, and if it seems to be good enough I will test how it can be integrated (manually) with emails and a mailing list. If things work out smoothly enough, I will go down this road.

What I don't want to do is to integrate the issue repository in the code repository. I will have a dedicated repository for issues only, I guess. On the other hand, that makes things complicated with pull request handling, because one cannot comment on PRs or close issues with PRs. That's really unfortunate, IMO. Maybe imag will become the first project which heavily uses git-dit. Importing the existing issues from github would be really nice for that, indeed. Maybe I'll find a way to script the import functionality. As I want a complete move, I do not have to keep the issue tracking mechanisms (git-dit and github issues) in sync, so at least I do not have this problem (which is a hard one on its own).

tags: #open-source #programming #software #tools #git #github

How to improve your open source code (1) – if statements

September 5, 2017

Please note that this is totally biased and might not represent the ideas of the broad community.

During my trip through Iceland I had a really great time seeing the awesome landscapes of Iceland, the Geysir, the Fyords and the Highlands. But in the evening, when the wind got harsher and the temperatures went down, I had some time for reading and thinking.

This is when I started this blog article series. I thought about general rules that would help me improving my open source code.

I started thinking about this subject after reading a great article once Fluent C++, about how to make if statements more understandable. This article really got me thinking about this subject, because it makes some really good points and if you haven't read it yet, you clearly should invest a few minutes reading it (and this blog in general, of course). I'll wait and we'll continue if you're ready reading it.

Why thinking about this in the first place?

Well, everyone knows you're already writing the best code possible and everyone who doesn't understand it is not worth your almighty code! Some people sadly think like this. And of course this is not true at all.

Once I've read this great statement, I guess it was also on the Fluent C++ Blog (again: read this blog, it is awesome) which goes approximately like this:

If you look at code you've written six months ago and you cannot think of a way to improve it, you haven't learned anything in six months, and this is as bad as it can get

If you think about this for one minute, it is absolutely right and you really don't want to be at this point. So, you have to start to think about your code. But where to start? Well, at the beginning, you might think. And that's absolutely right. You have to think about the small things first, so lets start with if statements.

Making if statements more understandable

Basically what Jonathan Boccara said in Fluent C++ said. I won't repeat what he has written, just shortly conclude: give long if expressions names by defining functions for the conditions, represent the domain in your if statements, don't be more general than your domain specifications.

The last of these is the point I want to focus on in this article. Full quote:

Don't compress an if statement more than in the spec

But in open source software development we often do not have any spec. If you're working on a hobby project in your free time, improving someones code or contributing some functionality to an open source project you're interested in, you only have the idea in your (and maybe also in someone elses) head. Sometimes you or some other people already had a discussion about the feature you're about to implement and a rough idea how it should be done. Maybe even a concrete idea. But you'll almost never have a specification where edge cases, preconditions and invariants of your functionaly are defined. And most of the time you don't need one. In open source, people come together who have a similar interest and goal - no specifications required, because all contributors involved know what a functionality should do and what not.

Show me code

In the following, I'm using Rust as language for my examples. I'm not doing anything Rust specific here, so people without knowledge of the Rust programming language shouldn't have to learn new things to understand my point, it is just that I'm most comfortable with this language right now.

So what Jonathan already said, one should not make if statements arbitrarily long and complex. Helper functions should be used for statements, even if these functions are only used once. It can heavily improve the readability of your code.

if (car.haswheels(context.requirednumofwheels()) && car.maxspeed() > SpeedUnit::KMH(25)) || car.buildingyear() > Time::Year(2000) { // ... }

the condition from above can be greatly improved in readability by moving it to a helper function.

fn carisneworfast(car: &Car, context: &Context) –> bool { car.haswheels(context.requirednumofwheels()) && car.maxspeed() > SpeedUnit::KMH(25)) || car.buildingyear() > Time::Year(2000) }

//...

if carisneworfast(&car, &context) { // ... }

You might think that does not improve the code at all, just moving the complexity somewhere else – that's not entirely true. If you have only five-line-functions, yes. But if your functions are ten, fifteen, fifty or even a hundred lines long and you have several if statement of similar complexity, moving away such things can improve it a lot.

Also, you can make complex conditions testable by moving them to functions, which is also a nice-to-have.

But, but, but... speed?

One might come up now with the obvious question: Does my code get slower because of this? I would say it depends. Fluent C++ has answered this question for C++, and I would guess that this also holds for Rust, maybe even without the 2%/7% speed decrease Jonathan is experiencing, especially if the code is inlined by the Rust compiler. Even though you might get a bit slower code, you have to think of the one question that I greatly value, not only when it comes to execution speed, but also in other cases: Does it matter?

Does it matter whether your code gets a bit slower? Is this particular piece of code crucial for your domain? If not – expressiveness first, speed second! If it does, write the expressive version first and then: measure. If the expressive version has a performance impact you cannot tolerate, you can still optimize it later.

Next...

What's up next? Well, I don't know. I will get myself inspired by other blog posts and articles and maybe I'll publish the next article for this series soonish. But maybe it takes a month or two, maybe even more, until I got some content. I don't want to make this a weekly thing or something like that, so I'll leave it undefined when the next article of this series will be published.

Thanks for reading.

tags: #open-source #programming #software

Writing a Rust library crate for time calculating (1)

September 5, 2017

In this blog post, which might turn into a short series, I want to plan a rust library crate and write notes down how to implement it.

This article was yet another one that I wrote while being on my trip through Iceland. As you can see – my head does never stop thinking about problems.

Usecase

So first of all, I want to write down the use case of this library. I had this idea when thinking about how to design a user frontend for imag (of course) and came to the conclusion that rust lacks a library for such a thing. So why not writing one?

I want to design the user interface of this library crate approximately like Rails did with their implementation of the same functionality for Ruby (bear with me, I'm not that involved in the Ruby world anymore so I don't know whether this is actually Rails or just another gem that comes with it).

So what I want to be able to do is something like this:

let event_date = today() – days(2) + weeks(10);

for example. I'm not yet entirely sure whether it is possible to actually do this without returning Result<_, _> instead of real types (and because I'm in Iceland without internet connection, I cannot check). If results need to be returned, I would design the API in a way so that these functions and calls only create a AST-like object tree which then can be called with a function to calculate the final result:

let eventdate = today() – days(2) + weeks(10); let eventdate = try!(event_date.calc());

But even more ideas come to mind when thinking about functionality this library may provide:

// Creating iterators today().repeat_every(days(4)) // –> Endless iterator

// Convenience functions (today() + weeks(8)).endofmonth() // The end of the month of the day in 8 weeks

today().endofyear().day_name() // name of the day at end of the current year

today().until(weeks(4)) // Range of time from now until in 4 weeks

// more ...

Later on, a convenient parser could be put in front of this, so a user can actually provide strings which are then parsed and be calculated.

calculate(“now – 4h + 1day”)

Which then could of course be used to face a user as well.

Core Data types

As the foundation of this library would be the awesome “chrono” crate, we do not have to reimplement all the time-related things. This eases everything quite a lot and also ensures that I do not double work which others have done way better than I could have.

So at the core of the library, we need to encapsulate chrono types. But there are many user-facing types in chrono and we cannot assume we know which of them our users need. So we have to be generic over these types, too. This is where the fun starts.

At the very base level we have three kinds of types: Amounts (like seconds, minutes, etc, fixed points in time as well as time ranges:

pub enum TimeType { Seconds(usize), Minutes(usize), //... Years(usize), Point©, Range(A, B) } // A, B and C being chrono types which are wrapped

As I assume right now, we cannot simply subtract and add our types (and thus chronos types) without possible errors, so we have to handle them and return them to the user. Hence, we will create intermediate types which represent what is about to be calculated, so we can add and subtract (etc) them without error:

enum OpArg { TT(TimeType), Add(AddOp), Sub(SubOp) }

pub struct AddOp(OpArg, OpArg); pub struct SubOp(OpArg, OpArg);

trait CalculateableTime { calc(self) –> Result; }

with the trait implemented on the former types – also the enum maybe as I explain in a few words.

To explain why the CalculateableTime::calc() function returns a TimeType rather than a chrono::NaiveDateTime for example, consider this:

(minutes(15) – seconds(12)).calc()

and now you can see that this actually needs the function to return our own type instead of some chrono type here.

The OpArg type needs to be introduced to be able to build a tree of operations. In the calc implementation for the types, we can then recursively call the function itself to calculate what has to be calculated. As the trait is implemented on TimeType itself, which just returns Self then, we automatically have the abort-condition for the recursive call. To note: This is not tail-recursive.

Optimize the types

After handing this article over to two friends for some review, I got told that the data structures can be minified into one data structure. So no traits required, no private data structures, just one enum and all functions implemented directly on it:

Subtraction(TimeType, TimeType), Addition(TimeType, TimeType) }

and as you can see, also almost no generics.

After thinking a bit more about this enum, I concluded that even things like EndOfWeek, EndOfMonth and such have to go into it. Overall, we do not want a single calculation when writing down the code, only lining up of types where the calculate function takes care of actually doing the work.

Helper functions

In the former I used some functions like seconds() or minutes() – these are just helper functions for hiding more complex type signatures and can hopefully be inlined by the compiler:

pub fn seconds(s: usize) –> TimeType { TimeType::Seconds(s) }

So there is not really much to say for these.

Special Functions, Ranges, Iterators

To get the end of the year of a date, we must hold the current date already, so these functions need to be added to the TimeType type. Ranges can also be done this way:

now().until(tomorrow()) // –> TimeType::Range(_, _)

Well, now the real fun begins. When having a TimeType object, one should be able to construct an Iterator from it.

The iterator needs to be able to hold the value it should increase itself every time as well as a copy of the base value. With this, one could think of an iterator that holds a TimeType object and every time the next() function is called, adds something to it and returns a copy of it.

Another way of implenting this would be to know how many times the iterator has been called, multiply this with the increase value and add this to the base.

I like the latter version more, as it does not increase the calculations needed for getting the real value out of the TimeType instance every time the iterator is called.

This way, one can write the following code:

let v : Vec<_> = now() .every(days(7)) .map(TimeType::calculate) .take(5);

to retrieve five objects, starting from today, each separated by one week.

What I think I'll do in the next iteration on this series is summarize how I want to develop this little crate. I guess test driven is the way to go here, after defining the type described above.

Please note: This article was written a long time ago. In the meantime, I learned from a nice redditor that there is chrono::Duration which is partly what I need here. So I will base my work (despite beeing already started into the direction I outlined in this article) on the chrono::Duration types and develop the API I have in mind with the functionality provided by chrono.

For the sake, I did not alter this article after learning of chrono::Duration, so my thoughts are lined up like I originally had them.

tags: #open-source #programming #software #tools #rust

Planning a log-functionality refactoring for imag

August 27, 2017

Here I want to describe how I plan to refactor the logging back end implementation for imag.

This post was published on imag-pim.org as well as on my personal blog.

What we have

Right now, the logging implementation is ridiculously simple. What we do is: On every call to one of the logging macros, the log crate gives us an object with a few informations (line number, file, log message,...) – we apply our format, some color and write it to stderr.

This is of course rather simple and not really flexible.

What we want to have

I want to rewrite the logging backend to give the user more power about the logging. As we only have to rewrite the backend, and the log crate handles everything else, the actual logging looks non different and “client” code does not change.

+----------------------------+
| imag code, libs, bins, ... |
+----------------------------+
              |
              | calls
              |
              v
+----------------------------+
| crate: "log"               |
+----------------------------+
              |
              | calls
              |
              v
+----------------------------+
| imag logging backend impl. |
+----------------------------+

So what features do we want? First of all, the imag user must be able to configure the logging. Not only with the configuration file but also via environment variables and of course command line parameters. The former will be overridden by the latter, respectively. This gives the user nice control, as she can configure imag to log to stderr with only warnings being logged but when calling a script of imag commands or calling imag directly from the command line, these settings can be temporarily (for the script or one command) be overridden.

The configuration options I have in mind are best described by an example:

# The logging section of the configuration
[logging]

# the default logging level
# Valid values are "trace", "debug", "info", "warn", "error"
level = "debug"

# the destinations of the logging output.
# "-" is for stderr, multiple destinations are possible
default-destinations = [ "-", "/tmp/imag.log" ]

# The format for the logging output
#
# The format supports variables which are inserted for each logging call:
#
#  "%no%"       - The of the logging call
#  "%thread%"   - The thread id from the thread calling the log
#  "%level%"    - The logging level
#  "%module%"   - The module name
#  "%file%"     - The file path where the logging call appeared
#  "%line%"     - The line No of the logging call
#  "%message%"" - The logging message
#
# Functions can be applied to the variables to change the color of
# the substitutions.
#
# A format _must_ contain "%message%, else imag fails because no logging should
# be forbidden
#
[logging.formats]
trace = "cyan([imag][%no%][%thread%][%level%][%module%][%file%][%line%]): %message%"
debug = "cyan([imag][%no%][%thread%][%level%][%module%][%file%][%line%]): %message%"
info  = "[imag]: %message%"
warn  = "red([imag]:) %message%"
error = "red(blinking([imag][uppercase(%level%)]): %message%)"

# Example entry for one imag module
# If a module is not configured or keys are missing
# the default values from above are applied
[logging.modules.libimagstore]
enabled = true
level = "trace"
destinations = [ "-" ]
# A format is only globally configurable, not per-module

One of the most complex things in here would be the format parsing, as variable expansion and functions to apply are some kind of DSL I have to implement. I hope I can do this – maybe there's even a crate for helping me with this? Maybe the shellexpand library will do?

These things and configuration options give the user great power over the logging.

The approach

Because imag already logs a lot, I think about an approach where one thread is used for the actual logging. Because each logging call involves a lot of complexity, I want to move that to a dedicated thread where other threads speak to the logging thread via a MPSC queue.

Of course, this should be opt-in.

The idea is that the logging starts a thread upon construction (which is really early in the imag process, nearly one of the first operations done). This happens when the Runtime object is build and hence no “client code” has to be changed, all changes remain in libimagrt.

This thread is bound to the Runtime object, logging calls (via the logging backend which is implemented for the log crate) talk to it via a channel. The thread then does the heavy lifting. Of course, configuration can be aggregated on construction of the logging thread.

The logging thread is killed when the Runtime object is dropped (one of the last operations in each imag process). Of course, the queue has to be emptied before the logging is closed.

I am also thinking about converting the code base to use the slog crate, which offers structured logging. But I'm not yet sure whether we will benefit from that, because I don't know whether we would need to pass a state-object around. If that is the case, I cannot do this as this would introduce a lot of complexity which I don't want to have. If no such object needs to be passed around, I still have to evaluate whether the slog crate is a nice idea and of course this would also increase the number of (complex) dependencies by one... and I'm not sure whether the benefits outrule the inconveniences.

tags: #linux #open source #programming #rust #software #tools #imag

What's coming up in imag (25)

May 13, 2017

This is the 25th iteration on what happened in the last four weeks in the imag project, the text based personal information management suite for the commandline.

imag is a personal information management suite for the commandline. Its target audience are commandline- and power-users. It does not reimplement personal information management (PIM) aspects, but re-uses existing tools and standards to be an addition to an existing workflow, so one does not have to learn a new tool before beeing productive again. Some simple PIM aspects are implemented as imag modules, though. It gives the user the power to connect data from different existing tools and add meta-information to these connections, so one can do data-mining on PIM data.

What happenend?

Luckily I can write this iteration on imag. After we had no blog post about the progress in imag in April this year, due to no time on my side, I'm not very lucky to be able to report: We had progress in the last 4 (8) weeks!

Lets have a look at the merged PRs (I'm now starting to link to git.imag-pim.org here):

#915 merged a libruby dependency for travis.
#918 removed some compiler warnings.
#917 merged some travis enhancements/fixes.
#916 superceeded PR #898, which simplified the implementation of the FoldResult extension.
#895 started a re-do of the ruby build setup.
#911 changed the interface of the StoreId::exists() function to return a Result now.
#904 added initial support for annotations in the libimagentrylink library, which gives us the posibility to add annotations to links. There are no tests yet and also no remove functionality.
#921 was a cleanup PR for #911 which broke master unexpectedly.
#914 fixed a compiler warning.
#929 removed libimagruby entirely because we couldn't merge to master because a dependency on master started to fail. The whole ruby thing is a complete mess right now, dependencies are not found, tests fail because of this... it is a mess.
#927 removed unused imports.
#924 updated links in the readme file.
#926 added tests for the StoreId type.
#919 merged preparings for the 0.3.0 release, which is overdue for one month right now, because the ruby scripting interface does not work.
#930 updated the toml-rs dependency to 0.4, which gives us even more superpowers.
#932 added some tests for the configuration parsing functionality.
#933 Adds a new dependency: is-match, a library I extracted from the imag source code into a new crate.

The libimagruby mess

Well, this is unfortunate.

libimagruby should be ready for one month by now and usable – and it is (the basic things, few things tested also)! But as the CI does not work (fuck you travis!) I cannot merge it. I also don't know how to properly package a Ruby gem, so there's that.

I really hope @malept can help me.

I'm already thinking about adding another scripting interface, so I can continue and start implementing frontends for imag, for example I'm thinking about a lua or ketos interface, still. Lua might be the better idea, as there are libraries around for certain things, while there are no libraries for ketos (I assume).

What will happen

I honestly don't know. I will continue working on imag, of course, but right now, the libimagruby is stalled. I'm not sure where to start working besides libimagruby – a Ruby scripting interface is what I need right now, but it won't work ... so there's that.

As soon as the Ruby interface is ready, we can have nice things. But right now, it is really hard to continue.

tags: #linux #open source #programming #rust #software #tools #imag