Building Builder
59 Days Until I Can Walk
Before I say anything, can we just appreciate for a moment that the little counter until I can walk again is now in the 50s. Cause for celebration, to be sure!
I spent today working through the first problem in David Tolnay’s Procedural Macros Workshop, and have made it a little more than half way through the task.
The first task is to define a Builder macro which may be used to generate a builder struct for any struct which derives from Builder
. See the code sample below for an example:
Tolnay provides a series of unit tests of increasing complexity (nine in total) which add extra requirements and features to the Builder
macro as the project develops. The initial unit test simply checks if the macro is defined. The fifth checks if it is correctly generating setter functions that can be chained. The last checks if your macro will break when prelude types are redefined.
I want to start this post by talking about some of the things I learned over the course of the day, ranging from small discoveries to large. I will reproduce my solution in full afterwards and will discuss some of the implementation details, design decisions, and flaws as I understand them.
All Learnings Great And Small
Let’s start with the most obvious. You can define a macro by declaring a function which takes a TokenStream
as input and returns a TokenStream
as output and annotating the function with the proc_macro_derive
attribute as shown below:
Now, a question: why do I fully qualify the TokenStream
? This is because there are actually two proc_macro
crates which are commonly used when writing macros, proc_macro
and proc_macro2
. My understanding is that proc_macro2
is actually a wrapper around several structs and functions defined in proc_macro
. However, proc_macro
can only be used in procedural macros. This means that it cannot be used, for example, in build.rs
files, or unit tests. proc_macro2
can be used everywhere. By definition, functions annotated with proc_macro_derive
must return a proc_macro::TokenStream
. But everywhere else in our code can use proc_macro2
. So we fully qualify the one struct we need from proc_macro
, and then import everything else with proc_macro2
.
Useful Crates
Two very popular crates for writing macros (aside from proc_macro2
) seem to be syn
and quote
.
syn
provides support for parsing the input TokenStream
into a meaningful datastructure which can be traversed and examined. TokenStream
is literally a raw stream of tokens extracted from your source code. syn
can parse this into a tree structure, allowing you to perform tasks such as checking the type of a variable, identifying the fields of a struct, and more. Rather than trying to work with the raw TokenStream
, adding syn
to your crate gives you a much more powerful interface for inspecting the inputs to your macro.
quote
is the inverse of syn
and is used to generate an output TokenStream
. The syntax here is extremely simple. You can basically write normal Rust code, but inject variables into your code by marking them with the #
symbol. For example:
Given the Command
struct used as an example earlier, the code above within the quote!
macro will expand to:
So quote
makes it very easy to generate TokenStream
s using Rust-like syntax as input.
Confusing Syntax
One feature of generating TokenStream
s using quote!
which has absolutely broken my brain is the syntax for iterating over collections. Expanding a little more on the previous example, let’s assume that we also want to iterate over all the fields for the input struct and add them as Optional
fields in the builder. We could do that using the code below:
Admittedly this looks like a large expansion, but the line I want to draw attention to is:
This is what iteration looks like in quote!
We have defined two collections, field_name
and field_type
. Iterating over these collections in the macro uses the #()*
syntax. In Rust macro terminology, these are called “repetitions”. Quoting from the README for quote:
Repetition is done using #(…)* or #(…),* similar to macro_rules!. This iterates through the elements of any variable interpolated within the repetition and inserts a copy of the repetition body for each one. The variables in an interpolation may be anything that implements IntoIterator, including Vec or a pre-existing iterator.
+
#(#var)*
– no separators+
#(#var),*
– the character before the asterisk is used as a separator+
#( struct #var; )*
– the repetition can contain other things+
#( #k => println!("{}", #v), )*
– even multiple interpolationsNote that there is a difference between
#(#var ,)*
and#(#var),*
—the latter does not produce a trailing comma. This matches the behavior of delimiters in macro_rules!.
This took some getting used to, and I would particularly like to dive into how multiple interpolations work. What happens, for example, if one collection is longer than the other?
stringify!
Exists
This might seem like a small thing, but let’s I want to generate a string in a macro which contains the name of a field e.g. rather than have an error message say "All fields must be initialized"
, I can specifically say "args must be set"
. stringify!
allows me to pass in some variable data and a set of tokens. These will be wrapped in quotes and returned as a string literal.
Solving The Builder Problem
Ok, let’s have a look at what I actually did to solve the first of David’s problems. My full solution for the first five unit tests is printed below.
Given what I have explained above, I think it should be reasonably clear what has been done here. The challenges completed in order are:
- Define the
Builder
macro - Define the
builder()
function on the input struct that allows us to initialize a builder for that struct - Generate setters for the builder which match the fields of the input struct
- Call a
build
function and return a populated instance of the struct. There should be some error handling to ensure all fields have been initialized - Demonstrate that chaining works in the setters
Solving the first problem is extremely easy. We simply create a function derive
which takes a TokenStream
as input, returns a TokenStream
as output, and is annotated with proc_macro_derive
.
The second challenge requires us to generate an identifier for the builder based on the name of the input struct. We parse the input TokenStream
into a DeriveInput
struct and extract the identifier of the struct. We then generate a new identifier, appending the word “Builder” to the end of the input struct’s identifier. We need to define the Builder struct itself, and all of its fields and corresponding types. We therefore extract all the fields and their types from the input struct and wil use these to generate code that will define and initialize the builder. We do a little bit of error handling first to ensure that we have been passed an object for which we can create a builder. This is predicated on whether or not the input is a struct
.
Note in the above call to quote!
, #ident
will expand to the name of the input struct (e.g. Command
) and #builder_ident
will expand to the name of the builder (e.g. CommandBuilder
). Here we can also see how iteration works, looping through the field_name
and field_type
collections both to define the builder struct and populate it in the builder
function.
Generating setters for the builder is a relatively straightforward iteration on defining the builder itself. Simply iterate over all fields and their types and generate a setter that returns a reference to the builder so that we can do chaining:
Note that the fields on the builder struct are all of type Optional
, hence the call to Some
.
Because of the requirement for error handling, implementing build
is a little more challenging, but not by much. We simply unwrap the Optional
fields on the builder and raise an appropriate error if None
is returned. Note the use of stringify!
so that we can specifically indicate which field threw the error in our error messages.
Because of how I have written my code, I get the final unit test for free. All my setters return a reference to self
, and so this test will pass.
Conclusion
Work on the engine has paused for a little bit while I try to get to grips with macros. I’ll be continuing this exercise tomorrow as it has been incredibly useful. I am still hoping to have finished The Great Refactor by the end of the week, but the more I dig into this, the more getting properly to grips with macros seems like a priority. So there will be more updates on this tomorrow.