How to improve your open source code (3) - Modularization

This post was written during my trip through Iceland. It is part of a series on how to improve ones open-source code. Topics (will) contain programming style, habits, project planning, management and everything that relates to these topics. Suggestions welcome.

Please note that this is totally biased and might not represent the ideas of the broad community.

When it comes to hobby projects in the open source world, one often works alone on the codebase. There is no community around your tool or library that is actively pushing towards a nice codebase. Anyways, you can have one. And what matters in this regard (not only but also) is a nice modularization of your codebase.

Why modularization

It is not always beneficial to modularize your code. But if your program or library hits a certain size, it can be. Modularization helps not only with readability, but also with separation of concerns, which is an important topic. If you do not separate concerns, you end up with spaghetti code, code duplication and this leads to doubled effort, which is essentially more work for the same result. And we all know that nobody wants that, right? In the end we’re all lazy hackers. Also, if you write code twice, trice or maybe even more often, you end up with more bugs. And of course you also don’t want that.

SoC

Now that the “why” is cleared, the “how” is the next step to think about.

We talked about the separation of concerns in the last section already. And that’s what it boils down to: You should separate concerns. Now, what is a concern? An example would be logging. Fairly, you should use a decent library for the logging in your code, but that library might need some initialization routines called and some configuration be passed on. Maybe you even need to wrap some things to be able to have nice logging calls in your domain logic - you should clearly modularize the logging related code and move the vast majority of it out of your main code base. Another concern would be file IO. Most of the time when you doing File IO, you’re duplicating things. You don’t need to catch IO errors in every other function of your code - wrap these things and move them to ankther helper function. Maybe your File IO isnmore complex and you need to be able to read and write multiple files with similar structure all the time - a good idea would be to wrap these things in a module that handles these things only. The concern of File IO is now encapsulated in a module, errors are handled there and users of that module know what they’re working with - An abstraction which either works or fails in a way so that they don’t have to re-write error handling in every other function but can simply forward errors or fail in a defined way.

If you think about that for a moment, you’ll notice that modularization is a multilevel thing. Once you have nicely defined modules for several concerns, you might build new and more high-level modules out of them. And that’s also exactly what frameworks do most of the time: They combine libraries (which might be already at a certain level of abstraction) into a new, even more highlevel, library for another concern. For example, a web framework combines a database library, templating library, middleeare library, logging library, authentication and authorization libraries into one big library, which is then - inconsistently - called a framework.

CCC

No, I’m not talking about the Chaos Computer Club here, but about Cross Cutting Concerns. A cross cutting concern is one that is used throughout your entire codebase. For example logging. You want logging calls in your data access layer, your business logic and your user interface logic.

So logging is a functionality all modules of your software want to have and need. But because it is cross cutting, it is not possible to insert it as a layer in your stack of modules. So it is really important that your cross cutting concerns are as loosely coupled to all other modules as possible. In case of logging, this is particularly easy, because after the general setup of the logging functionality, which is done exactly once, you have a really small interface to that module. Most of the time, not more than four or five functions: one call for each level of verbosity (for example: debug, info, warn, error).

Naming things…

In know, I know… this bikeshedding topic again. I don’t want to elaborate to extensively on this subject, though a few thoughts must be said.

First of all, module names should be like function names (which I explained in the last episode) - short, to the point and describing what a function, in our case module, does. And that’s still it. A module should be named after its domain. If the module does, for example, contain only some datatypes for your library, or only algorithms to calculate things, it should be named exactly that: “types” or “algo”.

A second thing I would clearly say is that module names should be a one-word-game. Module names should hardly contain more than one word - why would they? A domain is one word, as showed above.

Multiple libraries

What I really like to do is to have several libraries for one application, if the codebase hits a certain complexity. I like to take imag as an example. Imag is a collection of libraries which build on a core library and offer abstraction over it. This is a great separation of concerns and functionality. So a user of the imag source can use some libraries to get a certain functionality and leave out others if their functionality is not needed. That’s a great separation of concerns with loose coupling. But when it comes to this approach, another topic gets important as well, and we’ll talk about that in the next episode.

The naming restrictions I stated above apply to library names which are solely for separation inside of an applications, too, though I would relax the one-word-name rule a bit here.

Next

In the next Episode of this blog series we will talk about another important subject which is related to modularization: API design.