If you want to read more about our blockchain hands-on event, there is a German article about it.
The task at hand: mining a block
Mining a new block was supposed to be done by means of a proof-of-work algorithm. Let’s first look at the JSON representation of what a block looks like:
As you can see, a block candidate contains an index, which denotes the position in the chain, a timestamp, a list of transactions, and a reference to the hash of the previous block. The list of transactions is basically the payload of our block, but what it exactly looks like is not relevant to our problem.
In addition to all those other fields, a block candidate contains a numerical field called proof
. Typically, the task of mining a new block is to find a hash of the block that starts with a specific number of leading zeros. Specifically, in our case, a block is valid if the SHA256 hash of the JSON representation (using the order of fields shown above) starts with six zeros, i.e. with 000000
.
How do we find such a block? All the fields of the block are fixed, apart from proof
. So the goal is to find a value for this field that leads to a hash starting with 000000
when the JSON representation of the whole block is hashed.
Representing a block in Rust
In order to solve this problem in Rust, we first need to come up with appropriate representations of a block and a transaction. Since we do not want to do the JSON serialization ourselves, we are going to use serde, serde_json, and serde_derive.
Serde is a popular serialization library for Rust, supporting multiple formats, including JSON, and providing macros to automatically derive the necessary implementations of its Serialize
and Deserialize
traits using Rust’s built-in #[derive]
attribute.
That being said, here are the two Rust structs representing a Transaction
and a Block
, respectively:
The genesis block
A blockchain always needs a specific genesis block, which is the very first block in the chain. This genesis block is predefined and needs to be hardcoded into our blockchain implementation. We define a function returning the genesis block like so:
Validating a block candidate
In order to find a valid block, we need to be able to create a JSON representation of a block candidate. We’ll add a method to Block
that does just that and delegates to the functionality provided by serde_json
:
The JSON serialization may fail, but we want to fail miserably if that happens in our simple application, so we call unwrap
on the Result
returned by serde_json::to_string
.
We also need to be able to compute a SHA256 hash of a block candidate’s JSON representation. To do that, we make use of the crypto-hash crate. We define a simple function that takes an immutable reference to a Block
and returns a new String
:
Note that we are able to call to_json
on our Block
reference, since we defined to_json
as a method above, taking &self
as its first and only parameter. The hex_digest
function and the Algorithm
enum are imported from the crypto_hash
crate.
Let’s also create a function that checks whether a given hash is valid:
The valid
function expects a hash and a prefix to check for, both not of type String
, which owns its content, but &str
, whose content is borrowed. This is fine, as we don’t want do take ownership ourselves, or mutate the hash
or prefix
in any way.
Before we can use any of those functions, though, we first need to be able to create a candidate from a timestamp, a sequence of transactions, and the previous block. We define a constructor function that returns a new Block
:
Unlike the transactions, which need to be owned by the new Block
, the previous block is expected as an immutable reference, because we only want to read that block’s index and compute its hash.
Two ways of mining
Now that we have the basic machinery for mining a new block in place, let’s look at a few different approaches for implementing the actual mining in Rust.
The mutable mining mystery
I am a big fan of functional programming. That being said, one of the things I find so intriguing about Rust is how it helps you to deal with mutable state without going insane – the compiler is going out of its way to help you by preventing you from making certain mistakes that would lead to disastrous errors at runtime in most other languages.
That’s why the very first attempt at mining a new block that we chose to implement at our hands-on event was based on mutable state. Here is what it looks like:
By writing block: &mut Block
in our parameter list, we say that we want to mutably borrow a reference to a Block
, which means that we are allowed to mutate it in our function. The implementation of our mining function is then based on a cornerstone of imperative programming that has fallen into oblivion among functional programmers a long time ago: the while loop. As long as the hash of the block is not valid, we keep incrementing the block’s proof. This is the simplest possible way of finding a proof leading to a valid hash. Other approaches are conceivable as well. For instance, it would be perfectly possible to choose a random value for the proof instead of incrementing it. However, randomly chosen proofs is an approach we didn’t pursue at all at our hands-on event.
When we call this function, we need to pass in a mutable reference to a Block
candidate. The function will not return anything when it is done. Instead, we can inspect the mutated Block
that we passed in, for example to see the proof that led to a valid hash:
This will print the value 5518788
to stdout after a while.
The iterative identification inscrutability
It was pretty straightforward to implement the imperative solution to the mining problem using mutable state. At heart, though, I am still a functional programmer, so I always prefer a more functional and declarative solution that is more about the what, and less about the how. The great thing about Rust that very often, you have a choice. Using Rust’s powerful iterator library, we can write a purely functional solution like this:
The signature of this function is a bit different from the previous one. This time, we want to borrow a Block
reference immutably, and we return a completely new Block
.
The implementation is quite interesting: Using the 0..
notation, we create a half-open range starting at 0
. Range
implements the Iterator
trait, so that we can use all the available iterator adapters, including the map
adapter. Specifically, we map from a number, which is our proof, to a new Block
. Note that we clone the transactions
and the previous_block_hash
from the passed in block, because our new Block
needs to own them.
So far, we have an infinite iterator, because it’s based on a half-open range. We consume this iterator by means of the find
method, which expects a predicate, a function from &Block
to bool
. Our predicate checks whether the block is valid, using the functions we defined above. The find
method returns an Option<Block>
because in general, the predicate may never be fulfilled for any item. However, our assumption in this example application is that at some point, there will always be a proof value leading to a valid hash, so we simply unwrap the option. If the option is actually a None
, our application will crash.
Beautiful benchmarking business
The functional and immutable solution looks pretty good. However, we actually had to write a lot more lines of code than in the imperative approach, mainly because in order to stay immutable, we had to create a new Block
instead of mutating the existing one. This means we need to create a new Block
on the heap for every proof we try, and because a Block
owns the transactions and the hash of the previous block, we need to clone those from the passed in &Block
.
It’s certainly interesting to see how the two approaches compare when it comes to performance – will the mutable version be a lot faster than the functional one?
I wrote a small function that helps to do some crude benchmarking. Rust nightly has built-in support for benchmarking, but we want to run on stable (1.26.2), and there are some limitations to the built-in benchmarking that make it unsuitable for our use case.
For our own benchmarking functionality, we build on the histogram crate. Here is our benchmarking function:
It expects a label and a closure that can safely be executed repeatedly. It will create a new Histogram
and execute the passed in closure multiple times. Right now, it’s hard-coded to 10 times, which is not ideal to get reliable statistics, but since our mining algorithm takes quite a long time, I decided not to execute it more often. Each execution of the closure is timed, and the resulting value added to the histogram. At the end, a bunch of interesting values are printed to stdout, like the mean, median, max, and min execution time, as well as the standard deviation.
Now we can write some benchmarking code to compare the two implementations:
We use cargo run --release
in order to run a release build, which makes the compiler use a lot of optimizations that are not used in a development build. Here is the result on my MacBook Pro (Retina, 13-inch, Late 2012, 2,9 GHz Intel Core i7, four cores):
I already knew that the Rust compiler is supposed to do all kinds of optimizations when using iterators. Nevertheless, I was still surprised to see that the two implementations are more or less on the same level – in fact, the functional, immutable implementation is even a tiny bit faster than the imperative one. Based on the stats provided by the OS X Activity Monitor, the memory usage is about 25% higher, which is likely due to the additional heap allocations we did in order to stay immutable.
Summary
Implementing the block mining functionality in Rust is pretty straightforward. In this blog post, we saw that, for some CPU-bound tasks, functional implementations can be just as fast as imperative ones that are based on mutable state, because the Rust compiler is remarkably good at optimizing that stuff for you.
Our mining algorithm is pretty naive. Some obvious optimizations were deliberately left out. For example, it’s possible to only serialize a block to JSON once and only replace the value of the proof field in the resulting String
.
Another limitation of our two implementations is that they are both leveraging only a single core. Mining a new block, however, is a task that can probably be classified as embarrassingly parallel. In the follow-up to this blog post, we will look at three alternative implementations that utilize all the machine’s cores.