Planning a log-functionality refactoring for imag

Here I want to describe how I plan to refactor the logging back end implementation for imag.

This post was published on imag-pim.org as well as on my personal blog.

What we have

Right now, the logging implementation is ridiculously simple. What we do is: On every call to one of the logging macros, the log crate gives us an object with a few informations (line number, file, log message,...) – we apply our format, some color and write it to stderr.

This is of course rather simple and not really flexible.

What we want to have

I want to rewrite the logging backend to give the user more power about the logging. As we only have to rewrite the backend, and the log crate handles everything else, the actual logging looks non different and “client” code does not change.

+----------------------------+
| imag code, libs, bins, ... |
+----------------------------+
              |
              | calls
              |
              v
+----------------------------+
| crate: "log"               |
+----------------------------+
              |
              | calls
              |
              v
+----------------------------+
| imag logging backend impl. |
+----------------------------+

So what features do we want? First of all, the imag user must be able to configure the logging. Not only with the configuration file but also via environment variables and of course command line parameters. The former will be overridden by the latter, respectively. This gives the user nice control, as she can configure imag to log to stderr with only warnings being logged but when calling a script of imag commands or calling imag directly from the command line, these settings can be temporarily (for the script or one command) be overridden.

The configuration options I have in mind are best described by an example:

# The logging section of the configuration
[logging]

# the default logging level
# Valid values are "trace", "debug", "info", "warn", "error"
level = "debug"

# the destinations of the logging output.
# "-" is for stderr, multiple destinations are possible
default-destinations = [ "-", "/tmp/imag.log" ]

# The format for the logging output
#
# The format supports variables which are inserted for each logging call:
#
#  "%no%"       - The of the logging call
#  "%thread%"   - The thread id from the thread calling the log
#  "%level%"    - The logging level
#  "%module%"   - The module name
#  "%file%"     - The file path where the logging call appeared
#  "%line%"     - The line No of the logging call
#  "%message%"" - The logging message
#
# Functions can be applied to the variables to change the color of
# the substitutions.
#
# A format _must_ contain "%message%, else imag fails because no logging should
# be forbidden
#
[logging.formats]
trace = "cyan([imag][%no%][%thread%][%level%][%module%][%file%][%line%]): %message%"
debug = "cyan([imag][%no%][%thread%][%level%][%module%][%file%][%line%]): %message%"
info  = "[imag]: %message%"
warn  = "red([imag]:) %message%"
error = "red(blinking([imag][uppercase(%level%)]): %message%)"

# Example entry for one imag module
# If a module is not configured or keys are missing
# the default values from above are applied
[logging.modules.libimagstore]
enabled = true
level = "trace"
destinations = [ "-" ]
# A format is only globally configurable, not per-module

One of the most complex things in here would be the format parsing, as variable expansion and functions to apply are some kind of DSL I have to implement. I hope I can do this – maybe there's even a crate for helping me with this? Maybe the shellexpand library will do?

These things and configuration options give the user great power over the logging.

The approach

Because imag already logs a lot, I think about an approach where one thread is used for the actual logging. Because each logging call involves a lot of complexity, I want to move that to a dedicated thread where other threads speak to the logging thread via a MPSC queue.

Of course, this should be opt-in.

The idea is that the logging starts a thread upon construction (which is really early in the imag process, nearly one of the first operations done). This happens when the Runtime object is build and hence no “client code” has to be changed, all changes remain in libimagrt.

This thread is bound to the Runtime object, logging calls (via the logging backend which is implemented for the log crate) talk to it via a channel. The thread then does the heavy lifting. Of course, configuration can be aggregated on construction of the logging thread.

The logging thread is killed when the Runtime object is dropped (one of the last operations in each imag process). Of course, the queue has to be emptied before the logging is closed.

I am also thinking about converting the code base to use the slog crate, which offers structured logging. But I'm not yet sure whether we will benefit from that, because I don't know whether we would need to pass a state-object around. If that is the case, I cannot do this as this would introduce a lot of complexity which I don't want to have. If no such object needs to be passed around, I still have to evaluate whether the slog crate is a nice idea and of course this would also increase the number of (complex) dependencies by one... and I'm not sure whether the benefits outrule the inconveniences.

tags: #linux #open source #programming #rust #software #tools #imag