Developing a Query-API for toml-query
For one of my other projects (yes, imag), I developed a query library for TOML. I currently am planning a new feature for it: A query API which can be used to execute prepared queries on a given TOML document.
But let me start with the basics!
What is TOML?
“TOML” stands for “Tom's Obvious, Minimal Language” and is somewhat similar to JSON, though is highly readable and easy to understand. It is mainly used for configuration files in the Rust community, I use it as header format for imag store entries and as configuration file format for imag, for example.
A mini TOML file looks like this:
[table]
key = "value"
[table.subtable]
key = ["another value in an array"]
What does 'toml-query' do?
toml-query
, the library I developed, is actually an extension.
It extends the toml-rs
library, which is a serde
based library for the
TOML data format. Serde
is the serialization/deserialization framework in
the rust ecosystem. Thus, toml-rs
is a frontend to that framework to work
with the file format.
Because serde
is such an amazing tool, one can write code like this:
[derive(Serialize, Deserialize)]
struct Foo {
integer: i32,
string: String
}
to get a struct
which can be serialized to and deserialized from TOML
with minimal effort in Rust:
extern crate serde;
#[macro_use] extern crate serde_derive;
extern crate toml;
#[derive(Serialize, Deserialize)]
struct Foo {
integer: i32,
string: String,
}
fn main() {
let foo = Foo {
integer: 15,
string: String::from("Hello world!"),
};
let serialized = toml::to_string(&foo).unwrap(); // here is magic!
let text = r#"integer = 15
string = "Hello world!"
"#;
assert_eq!(text, serialized);
}
(this piece of code can be executed with the playground).
The resulting TOML can, of course, be deserialized back to an instance of
Foo
.
That's really neat if you want to read your configuration file, because you
simply have to write a struct
which describes the variables your
configuration file should have and let toml-rs
and serde
do the magic of
failure-free deserialization.
If an error happens, for example a key is not there, the deserialization fails
and you can forward the error to your user, for example.
But what happens if you have a really complex configuration file? What if you don't know, at build time of your program, what your configuration file looks like? What if you have things that are allowed to go wrong and you have to very precisely catch errors and handle them individually? Then, this awesomeness becomes complicated.
That's why I wrote toml-query
. It helps you maintain a real CRUD
(Create-Read-Update-Delete) workflow on TOML documents.
For example, when reading your toml document into memory and into toml-rs
structures, you can then read and write specific values by their path:
extern crate serde;
extern crate toml;
extern crate toml_query;
fn main() {
let text = r#"integer = 15
string = "Hello world!"
"#;
let toml : toml::Value = toml::de::from_str(text).unwrap();
let int = toml.read("integer") {
Ok(Some(&Value::Integer(i))) => i,
Ok(Some(_)) => panic!("Type error: Not an integer!"),
Ok(None) => panic!("Key 'integer' missing"),
Err(e) => panic!("Error reading TOML document: {:?}", e);
}
}
The upper code example reads the TOML document into a Value
(which is a
datatype provided by toml-rs
) and then read()
s the value at "integer"
.
This read operation is done via the “path” to the value, and of course this
path is not only a string. Things like "table.subtable.value"
are possible.
Array indexes are possible. This works with several CRUD operations: Reading
values, writing values and creating intermediate “tables” or “arrays” if they
are not already created, updating values and of course also deleting values.
Why a Query-API?
The things I explained above are entirely CRUD functionality things. There is no “query” thing here.
The next step I am currently thinking about is an API which can be used to build complex queries, chaining them and (not in the first version of the API, but maybe later), also rolling them back.
The idea would be an API like this:
let query = Read::new("foo")
.and_then(SetTo::new(|x| x + 1))
.and_then(DeleteAt::new("bar"));
let query_result = document.execute(query);
Here, we build a query which reads a value at “foo”, then increments that value and after that deletes the value at “bar”. If one of these steps fails, the others are never executed.
The equivalent in CRUD calls would look like this:
let value = document.read("foo").unwrap().unwrap();
let value = value + 1;
document.set("foo", value).unwrap();
document.delete("bar").unwrap();
The calls to unwrap()
are here to show where errors can happen. All this
would be hidden in the query functionality and the query_result
would then
hold an error which can be used to tell the user what exactly went wrong and
where.
The exact shape and semantics of the API are not as shown in the example above. The example is solely used for visualizing how the API would look like.
How does the Query-API work?
The basic idea here is to encapsulate CRUD calls into objects which then can be chained in some way. That's the whole thing, actually.
The important thing is that the user is able to define own “Query types”: Types which can be put into the chain of queries and which are composed of other query types. This way, a user can basically define structures and procedures to write code like this:
let result = document.execute(TransformConfigFileFormat::new());
and the TransformConfigFileFormat
type then transforms an old config file
format to a new one (for example).
This requirement makes the whole thing complicated. To list the full requirements:
- A query may return data which may be used by the next query in the chain
- A user of the library should be able to compose new query objects from existing ones
- The CRUD functionalities shall be provided as “query types”
- The API should be easy to use with both “query types” and closures (think:
query = other_query.and_then(|val| val + 1);
Reversible queries / Transactions
A really nice thing would be reversible queries.
In this scenario, one would be able to call a chain of queries and if one of the queries fails, the document is left untouched. This could be done by either copying the whole document before executing the query-chain and replacing the modified version with the unmodified if something failed, or by making the queries actually role-back-able (thus, an insert would reverse to a delete and the other way round, for example).
The first idea is more memory-intensive and the latter more runtime/CPU intensive. Maybe both would be an idea and the user is then able to decide.
Other things
One other thing which would be really great is to generalize the
functionality of toml-query
over all data-formats serde
provides
serialization and deserialization functionality for.
This would be the ultimate end-game and I'm sure I'm not able to do this
without help (because toml-query
is already really complex right now and
such a thing would increase complexity even more).
If someone wants to step up and do this, I'd love to help!