Developing a Query-API for toml-query

February 4, 2018

For one of my other projects (yes, imag), I developed a query library for TOML. I currently am planning a new feature for it: A query API which can be used to execute prepared queries on a given TOML document.

But let me start with the basics!

What is TOML?

“TOML” stands for “Tom's Obvious, Minimal Language” and is somewhat similar to JSON, though is highly readable and easy to understand. It is mainly used for configuration files in the Rust community, I use it as header format for imag store entries and as configuration file format for imag, for example.

A mini TOML file looks like this:

[table]
key = "value"
[table.subtable]
key = ["another value in an array"]

What does 'toml-query' do?

toml-query, the library I developed, is actually an extension. It extends the toml-rs library, which is a serde based library for the TOML data format. Serde is the serialization/deserialization framework in the rust ecosystem. Thus, toml-rs is a frontend to that framework to work with the file format.

Because serde is such an amazing tool, one can write code like this:

[derive(Serialize, Deserialize)]
struct Foo {
  integer: i32,
  string: String
}

to get a struct which can be serialized to and deserialized from TOML with minimal effort in Rust:

extern crate serde;
#[macro_use] extern crate serde_derive;
extern crate toml;

#[derive(Serialize, Deserialize)]
struct Foo {
    integer: i32,
    string: String,
}

fn main() {
    let foo = Foo {
        integer: 15,
        string: String::from("Hello world!"),
    };

    let serialized = toml::to_string(&foo).unwrap(); // here is magic!
    let text = r#"integer = 15
string = "Hello world!"
"#;
    assert_eq!(text, serialized);
}

(this piece of code can be executed with the playground).

The resulting TOML can, of course, be deserialized back to an instance of Foo. That's really neat if you want to read your configuration file, because you simply have to write a struct which describes the variables your configuration file should have and let toml-rs and serde do the magic of failure-free deserialization. If an error happens, for example a key is not there, the deserialization fails and you can forward the error to your user, for example.

But what happens if you have a really complex configuration file? What if you don't know, at build time of your program, what your configuration file looks like? What if you have things that are allowed to go wrong and you have to very precisely catch errors and handle them individually? Then, this awesomeness becomes complicated.

That's why I wrote toml-query. It helps you maintain a real CRUD (Create-Read-Update-Delete) workflow on TOML documents. For example, when reading your toml document into memory and into toml-rs structures, you can then read and write specific values by their path:

extern crate serde;
extern crate toml;
extern crate toml_query;

fn main() {
    let text = r#"integer = 15
string = "Hello world!"
"#;
    let toml : toml::Value = toml::de::from_str(text).unwrap();
    let int = toml.read("integer") {
        Ok(Some(&Value::Integer(i))) => i,
        Ok(Some(_)) => panic!("Type error: Not an integer!"),
        Ok(None)    => panic!("Key 'integer' missing"),
        Err(e)      => panic!("Error reading TOML document: {:?}", e);
    }
}

The upper code example reads the TOML document into a Value (which is a datatype provided by toml-rs) and then read()s the value at "integer". This read operation is done via the “path” to the value, and of course this path is not only a string. Things like "table.subtable.value" are possible. Array indexes are possible. This works with several CRUD operations: Reading values, writing values and creating intermediate “tables” or “arrays” if they are not already created, updating values and of course also deleting values.

Why a Query-API?

The things I explained above are entirely CRUD functionality things. There is no “query” thing here.

The next step I am currently thinking about is an API which can be used to build complex queries, chaining them and (not in the first version of the API, but maybe later), also rolling them back.

The idea would be an API like this:

let query = Read::new("foo")
  .and_then(SetTo::new(|x| x + 1))
  .and_then(DeleteAt::new("bar"));

let query_result = document.execute(query);

Here, we build a query which reads a value at “foo”, then increments that value and after that deletes the value at “bar”. If one of these steps fails, the others are never executed.

The equivalent in CRUD calls would look like this:

let value = document.read("foo").unwrap().unwrap();
let value = value + 1;
document.set("foo", value).unwrap();
document.delete("bar").unwrap();

The calls to unwrap() are here to show where errors can happen. All this would be hidden in the query functionality and the query_result would then hold an error which can be used to tell the user what exactly went wrong and where.

The exact shape and semantics of the API are not as shown in the example above. The example is solely used for visualizing how the API would look like.

How does the Query-API work?

The basic idea here is to encapsulate CRUD calls into objects which then can be chained in some way. That's the whole thing, actually.

The important thing is that the user is able to define own “Query types”: Types which can be put into the chain of queries and which are composed of other query types. This way, a user can basically define structures and procedures to write code like this:

let result = document.execute(TransformConfigFileFormat::new());

and the TransformConfigFileFormat type then transforms an old config file format to a new one (for example).

This requirement makes the whole thing complicated. To list the full requirements:

A query may return data which may be used by the next query in the chain
A user of the library should be able to compose new query objects from existing ones
The CRUD functionalities shall be provided as “query types”
The API should be easy to use with both “query types” and closures (think: query = other_query.and_then(|val| val + 1);

Reversible queries / Transactions

A really nice thing would be reversible queries.

In this scenario, one would be able to call a chain of queries and if one of the queries fails, the document is left untouched. This could be done by either copying the whole document before executing the query-chain and replacing the modified version with the unmodified if something failed, or by making the queries actually role-back-able (thus, an insert would reverse to a delete and the other way round, for example).

The first idea is more memory-intensive and the latter more runtime/CPU intensive. Maybe both would be an idea and the user is then able to decide.

Other things

One other thing which would be really great is to generalize the functionality of toml-query over all data-formats serde provides serialization and deserialization functionality for.

This would be the ultimate end-game and I'm sure I'm not able to do this without help (because toml-query is already really complex right now and such a thing would increase complexity even more).

If someone wants to step up and do this, I'd love to help!

tags: #software #rust #open-source