Writing a mastodon bot

January 17, 2021

Today, I wrote a mastodon bot.

Shut up and show me the code!

The idea

My idea was, to write a bot that fetches the lastest master from a git repository and counts some commits and then posts a message to mastodon about what it counted.

Because I always complain about people pushing to the master branch of a big community git repository directly, I decided that this would be a perfect fit.

(Whether pushing to master directly is okay and when it is not okay to do this is another topic and I won't discuss this here)

The dependencies

Well, because I didn't want to implement everything myself, I started pulling in some dependencies:

log and env_logger for logging
structopt, toml, serde and config for argument parsing and config reading
anyhow because I don't care too much about error handling. It just has to work
getset for a bit cleaner code (not strictly necessary, tbh)
handlebars for templating the status message that will be posted
elefren as mastodon API crate
git2 for working with the git repository which the bot posts about

The Plan

How the bot should work was rather clear from the outset. First of all, it shouldn't be a always-running-process. I wanted it to be as simple as possible, thus, triggering it via a systemd-timer should suffice. Next, it should only fetch the latest commits, so it should be able to work on a working clone of a repository. This way, we don't need another clone of a potentially huge repository on our disk. The path of the repository should of course not be hardcoded, as shouldn't the “upstream” remote name or the “master” branch name (because you might want to track a “release-xyz” branch or because “master” was renamed to something else).

Also, it should be configurable how many hours of commits should be checked. Maybe the user wants to run this bot once a day, maybe once a week. Both is possible, of course. But if the user runs it once a day, they want to check only the commits of the last 24 hours. If they run it once a week, the last 168 hours would be more appropriate.

The message that gets posted should also not be hardcoded, but a template where the variables the bot counted are available.

All the above goes into the configuration file the bot ready (and which can be set via the --config option on the bots CLI).

The configuration struct for the setup described above is rather trivial, as is the CLI setup.

The setup

The first things the bot has to do is reading the commandline and the configuration after initializing the logger, which is a no-brainer, too:

fn main() -> Result<()> {
    env_logger::init();
    log::debug!("Logger initialized");

    let opts = Opts::from_args_safe()?;
    let config: Conf = {
        let mut config = ::config::Config::default();

        config
            .merge(::config::File::from(opts.config().to_path_buf()))?
            .merge(::config::Environment::with_prefix("COMBOT"))?;
        config.try_into()?
    };
    let mastodon_data: elefren::Data = toml::de::from_str(&std::fs::read_to_string(config.mastodon_data())?)?;

The mastodon data is read from a configuration file that is different from the main configuration file, because it may contain sensitive data and if a user wants to put their configuration of the bot into a (public?) git repository, they might not want to include this data. That's why I opted for another file here, its format is described in the configuration example file (next to the setting where the file actually is).

Next, the mastodon client has to be setup and the repository has to be opened:

    let client = elefren::Mastodon::from(mastodon_data);
    let status_language = elefren::Language::from_639_1(config.status_language())
        .ok_or_else(|| anyhow!("Could not parse status language code: {}", config.status_language()))?;
    log::debug!("config parsed");

    let repo = git2::Repository::open(config.repository_path())?;
    log::debug!("Repo opened successfully");

which is rather trivial, too.

The Calculations

Then, we fetch the appropriate remote branch and count the commits:

    let _ = fetch_main_remote(&repo, &config)?;
    log::debug!("Main branch fetched successfully");

    let (commits, merges, nonmerges) = count_commits_on_main_branch(&repo, &config)?;
    log::debug!("Counted commits successfully");

    log::info!("Commits    = {}", commits);
    log::info!("Merges     = {}", merges);
    log::info!("Non-Merges = {}", nonmerges);

The functions called in this snippet will be described later on. Just consider them working for now, and let's move on to the status posting part of the bot now.

First of all, we use the variables to compute the status message using the template from the configuration file.

    {
        let status_text = {
            let mut hb = handlebars::Handlebars::new();
            hb.register_template_string("status", config.status_template())?;
            let mut data = std::collections::BTreeMap::new();
            data.insert("commits", commits);
            data.insert("merges", merges);
            data.insert("nonmerges", nonmerges);
            hb.render("status", &data)?
        };

Handlebars is a perfect fit for that job, as it is rather trivial to use, albeit a very powerful templating language is used. The user could, for example, even add some conditions to their template, like if there are no commits at all, the status message could just say “I'm a lonely bot, because nobody commits to master these days...” or something like that.

Next, we build the status object we pass to mastodon, and post it.

        let status = elefren::StatusBuilder::new()
            .status(status_text)
            .language(status_language)
            .build()
            .expect("Failed to build status");

        let status = client.new_status(status)
            .expect("Failed to post status");
        if let Some(url) = status.url.as_ref() {
            log::info!("Status posted: {}", url);
        } else {
            log::info!("Status posted, no url");
        }
        log::debug!("New status = {:?}", status);
    }

    Ok(())
} // main()

Some logging is added as well, of course.

And that's the whole main function!

Fetching the repository.

But we are not done yet. First of all, we need the function that fetches the remote repository.

Because of the infamous git2 library, this part is rather trivial to implement as well:

fn fetch_main_remote(repo: &git2::Repository, config: &Conf) -> Result<()> {
    log::debug!("Fetch: {} / {}", config.origin_remote_name(), config.master_branch_name());
    repo.find_remote(config.origin_remote_name())?
        .fetch(&[config.master_branch_name()], None, None)
        .map_err(Error::from)
}

Here we have a function that takes a reference to the repository as well as a reference to our Conf object. We then, after some logging, find the appropriate remote in our repository and simply call fetch for it. In case of Err(_), we map that to our anyhow::Error type and return it, because the callee should handle that.

Counting the commits

Counting the commits is the last part we need to implement.

fn count_commits_on_main_branch(repo: &git2::Repository, config: &Conf) -> Result<(usize, usize, usize)> {

The function, like the fetch_main_remote function, takes a reference to the repository as well as a reference to the Conf object of our program. It returns, in case of success, a tuple with three elements. I did not add strong typing here, because the codebase is rather small (less than 160 lines overall), so there's not need to be very explicit about the types here.

Just keep in mind that the first of the three values is the number of all commits, the second is the number of merges and the last is the number of non-merges.

That also means:

tuple.0 = tuple.1 + tuple.2

Next, let's have a variable that holds the branch name with the remote, like we're used from git itself (this is later required for git2). Also, we need to calculate the timestamp that is the lowest timestamp we consider. Because our configuration file specifies this in hours rather than seconds, we simply * 60 * 60 here.

    let branchname = format!("{}/{}", config.origin_remote_name(), config.master_branch_name());
    let minimum_time_epoch = chrono::offset::Local::now().timestamp() - (config.hours_to_check() * 60 * 60);

    log::debug!("Branch to count     : {}", branchname);
    log::debug!("Earliest commit time: {:?}", minimum_time_epoch);

Next, we need to instruct git2 to create a Revwalk object for us:

    let revwalk_start = repo
        .find_branch(&branchname, git2::BranchType::Remote)?
        .get()
        .peel_to_commit()?
        .id();

    log::debug!("Starting at: {}", revwalk_start);

    let mut rw = repo.revwalk()?;
    rw.simplify_first_parent()?;
    rw.push(revwalk_start)?;

That can be used to iterate over the history of a branch, starting at a certain commit. But before we can do that, we need to actually find that commit, which is the first part of the above snippet. Then, we create a Revwalk object, configure it to consider only the first parent (because that's what we care about) and push the rev to start walking from it.

The last bit of the function implements the actual counting.

    let mut commits = 0;
    let mut merges = 0;
    let mut nonmerges = 0;

    for rev in rw {
        let rev = rev?;
        let commit = repo.find_commit(rev)?;
        log::trace!("Found commit: {:?}", commit);

        if commit.time().seconds() < minimum_time_epoch {
            log::trace!("Commit too old, stopping iteration");
            break;
        }
        commits += 1;

        let is_merge = commit.parent_ids().count() > 1;
        log::trace!("Merge: {:?}", is_merge);

        if is_merge {
            merges += 1;
        } else {
            nonmerges += 1;
        }
    }

    log::trace!("Ready iterating");
    Ok((commits, merges, nonmerges))
}

This is done the simple way, without making use of the excelent iterator API. First, we create our variables for counting and then, we use the Revwalk object and iterate over it. For each rev, we unwrap it using the ? operator and then ask the repo to give us the corresponding commit. We then check whether the time of the commit is before our minimum time and if it is, we abort the iteration. If it is not, we continnue and count the commit. We then check whether the commit has more than one parent, because that is what makes a commit a merge-commit, and increase the appropriate variable.

Last but not least, we return our findings to the caller.

Conclusion

And this is it! It was a rather nice journey to implement this bot. There isn't too much that can fail here, some calculations might wrap and result in false calculations. Possibly a clippy run would find some things that could be improved, of course (feel free to submit patches).

If you want to run this bot on your own instance and for your own repositories, make sure to check the README file first. Also, feel free to ask questions about this bot and of course, you're welcome to send patches (make sure to --signoff your commits).

And now, enjoy the first post of the bot.

tags: #mastodon #bot #rust