The CBE series #1 - Online rekeying
This article explains in detail how online rekeying works in the Consistent Block Encrypter (CBE). Online rekeying means to re-encrypt a CBE block device completely with a new encryption key and eventually remove the old encryption key while the device remains accessible the whole time.
This series of articles describes the features and integration of the Consistent Block Encrypter (CBE) in detail. The CBE is a block-device encryption component with its core logic entirely written in SPARK. It combines multiple techniques to ensure confidentiality, integrity, consistency, and freshness of the block data efficiently. Furthermore, the whole core logic so far passes gnatprove in mode "flow".
In case you are not familiar with the basic structure of the CBE, I'd recommend you this introduction before reading on:
The other articles in this series so far:
The source code on GitHub:
The goal of a rekeying operation at a CBE device is to generate a new symmetric key for block encryption, apply this new key to the whole CBE device, and eventually delete the previously used key at all entities that store it, i.e., the CBE component, the CBE device, and the Crypto entity.
As rekeying requires the CBE to decrypt and re-encrypt the entire virtual block device, the operation can be very time intensive for large devices or in the presence of many device snapshots. Rekeying in the CBE is therefore implemented in a way that doesn't block all other operations on the device. The CBE will continue handling "Read", "Write", "Sync", and "Discard Snapshot" requests during the rekeying process. I will refer to them as concurrent requests. At the other hand, "Create Snapshot", "Extend Free Tree", "Extend Virtual Block Device", and "Rekeying" requests cannot be executed in parallel to rekeying. I will refer to these as non-concurrent requests.
Directly at the CBE interface, a rekeying operation is communicated like any other operation by submitting a request. The request has the operation type "Rekey" and no other parameters. This request will then be added to the request-scheduling queue of the CBE. The actual execution of the rekeying request, however, doesn't start until all non-concurrent requests that arrived at the scheduler before the rekeying are done. Vice versa, non-concurrent requests that arrived after the rekeying are blocked until the rekeying is finished.
Concurrent requests, at the other hand, are executed pseudo-parallel to rekeying. This means that rekeying is not done in one big operation. It is rather split up into many small atomic operations and each time one of these atomic operations is done, the CBE device is left in a state that allows for intermixing concurrent requests before the next rekeying step is started.
This also has the benefit that there are many consistent intermediate states during rekeying that can be secured to the physical block device and the trust anchor. Should the system be turned off during rekeying, the progress isn't lost (except the last unfinished step of course) and the rekeying can be continued on next startup. Better said, the rekeying has to be continued on next startup, because the virtual block device would otherwise remain in a state that limits the functionality of the CBE.
That said, the CBE has to remember that a rekeying is pending and in which state it is inside the superblock. And it will automatically continue a pending rekeying on startup as the first request in the scheduling queue. Note that this is only the case for a running rekeying operation. A rekeying requests somewhere in the scheduling queue that is not yet started will be - like any other scheduled request - forgotten on system shutdown.
There are two types of atomic rekeying operations: The initialization of rekeying and the rekeying of one virtual block address throughout all stored states of the virtual block device. Naturally, the initialization comes first. Afterwards, the VBAs are rekeyed from VBA 0 up to the highest VBA and always one at a time. The ascending order of the VBAs when rekeying is important as I will explain later. After each of these atomic rekeying operations, the CBE updates the superblock, secures the device state, and tries to prepone up to 8 concurrent requests before starting the next rekeying step.
Let's go more into detail about the atomic rekeying operations. For the rekeying initialization, the CBE first requests the external trust anchor to generate a new symmetric key using a random number generator. As soon as the CBE receives the new key, it is given a unique identifier and sent together with this identifier to the external crypto entity. This way, future encryption requests to the external crypto entity can reference the new key.
Now, the CBE sets the superblock to state "Rekeying". Furthermore, it remembers in the superblock that the current rekeying VBA is 0. Note, that the current rekeying VBA in the superblock always indicates the VBA that must be rekeyed next, i.e., the lowest VBA that isn't rekeyed yet.
Once this is done, the superblock and the corresponding state can be written out to the physical block device and secured at the trust anchor. This procedure looks as follows: First, the CBE issues two requests at the trust anchor to encrypt both the old and the new symmetric key using the private key of the user. Note that the private key of the user is known only to the trust anchor.
Then, the superblock is written out to a new superblock slot on the physical block device which is then synchronized together with the CBE block caches. Finally, the trust anchor is requested to store the hash of the superblock, i.e., the superblock is secured. In case of a system shutdown, this will allow the CBE to find and verify the new superblock slot on next startup. When the superblock was secured successfully, the last secured generation is set to the current generation and the current generation is incremented by one.
The superblock is now in the "Rekeying" state and the current rekeying VBA is set. In order to rekey the VBA, the CBE must iterate over all stored states of the device. It starts with the current VBD state and then continues with the snapshots going from the most recent snapshot (highest generation number) to the oldest (lowest generation number). For each device state, it translates the current VBA by doing a tree walk.
Note that the states are handled one after another. Because of that, the CBE needs to hold only one complete tree walk in memory throughout the whole process. Level by level the tree walk of one state overrides the tree walk of the last examined state from the highest tree level (root level) to the lowest tree level (leaf level). This fact is later used for an optimization of the algorithm.
Now, let's focus on one device state in this process. The CBE has translated the VBA for the device state and has decrypted the leaf node (the actual data block) with the old symmetric key. In a next step, the CBE requests the external crypto entity to re-encrypt the leaf node with the new symmetric key. This causes the hash of the leaf node to change and consequently all inner nodes of the tree walk must be updated.
This is actually the same as when writing to a virtual block address. Walking from the leaf level up the root level again, the hashes in the inner nodes are updated and the levels are written back. Inner nodes that are not yet volatile (i.e., they were allocated during a past generation of the device), can not be updated directly. Instead, new blocks are allocated in order to do a copy-on-write for them. When a rekeying operation allocates blocks at the free tree, the procedure inside the free tree is slightly different from allocations for other operations. However, from outside the free tree, both allocation types look the same which is why I will explain the difference later.
Now the rekeying of the current VBA is done for this specific device state and the CBE can start with the next older device state (if there's any). This is were the above mentioned optimization comes in. The CBE still holds the original blocks of the last tree walk. If, during the new tree walk a level should have the same physical block address as in the last tree walk, the CBE doesn't have to walk further down for this device state. The rest of the walk is the same as in the last, more recent device state and has been rekeyed already. The CBE can turn around here and walk up again doing the allocations and updates for the above levels.
As soon as the VBA was rekeyed for all device states this way, the CBE checks whether this was the highest VBA. If so, the superblock returns to the "Normal" state and the rekeying request is finished. If not, the superblock remains in the "Rekeying" state and the current rekeying VBA in the superblock is incremented by one. Finally, the CBE state is again synchronized and secured as described in the previous section.
Note that, in contrast to other operations that modify the device state, rekeying doesn't create snapshots. It merely modifies the meta-data of the existing device states, not the actual data that is stored on the CBE device. Nonetheless, the rekeying increments the generation of the most recent device state each time it secures the superblock. If it wouldn't, the blocks that it freed during copy-on-write would not become re-allocatable.
As already mentioned, the allocation of blocks for rekeying is a bit special inside the free tree. Normally, on an allocation for copy-on-write, the address of the original physical block, the one that shall be replaced, is given to the free tree. When the free tree has found a new physical block to hand out, it replaces the free-tree entry of the new block with an entry for the old block that indicates that the old block remains reserved. This means although the old block isn't part of the current device state anymore, it is potentially still used by an older state, a snapshot of the device.
Therefore the entry of the old block carries the names of the generations during which the block was allocated respectively freed. As long as there is still a snapshot with a generation number greater or equal the allocation generation and less the free generation of the block, the block stays reserved in the free tree.
Rekeying, however, is different. When rekeying does CoW, it doesn't do it to preserve the old device state for later user access. It doesn't create new snapshots - it merely re-writes the existing ones. So, rekeying does CoW because, in the process of rekeying a VBA for one device state, it has to be considered that the other, yet to be rekeyed older device states still reference the updated blocks with their original hashes. And if rekeying would update the blocks without CoW, it would break the remaining snapshots and run into a hash mismatch before the end of the current VBA-rekeying step.
This makes clear why the common allocation strategy wouldn't work for rekeying. The criterion when to revert the reservation of the old blocks in the free tree is not the vanishing of certain snapshots but whether rekeying has reached a certain point. In order to know this point for a specific block, two situations must be distinguished. Either, the block forms part of the current device state (I'll call this an effective block) or it doesn't and is relevant for snapshots only (I'll call this a superseded block).
Let's look at effective blocks first. When the rekeying replaces them, they can be re-allocated as soon as rekeying is done with the current VBA, because by then the rekeying has replaced them in all snapshots. That sais, a CoW-allocation adds the old block directly as "non-reserved" to the free tree. This causes the block to become re-allocatable as soon as the current generation is secured, which is done at the end of the VBA-rekeying step.
With superseded blocks things are more complicated. When being replaced by rekeying, they could become re-allocatable in the next generation as well. However, in contrast to effective blocks, for a superseded block there is already an entry in the free tree indicating that the block is reserved. Even more, because of the way the free tree is designed, there is no efficient way to find this entry. But we have to do something about this entry. Otherwise it would keep the block reserved until all generations that used to reference it disappeared (despite the fact that they are not referencing it anymore).
This is the point where we can make use of the ascending order in which VBAs are rekeyed. Because pseudo-reserved blocks always belong to a VBA less than the current rekeying VBA. So, in each free tree entry, the CBE additionally stores the VBA the block was used last for. For leaf nodes of the virtual block device, the VBA is clear. For inner nodes, however, the VBA of the left-most descendant is used. Furthermore, the ID of the last symmetric key of the block is remembered. With this additional information, pseudo-reserved entries become detectable. If, during an allocation, the superblock is in the "Rekeying" state, the free tree checks for reserved entries whether they have the old key ID and a VBA less than the current rekeying VBA. If so, a pseudo-reserved block was found that can be treated like a non-reserved block. As a result, such blocks become re-allocatable as soon as the rekeying of their VBA is finished.
Now, this elegant trick works fine for superseded blocks that were allocated during the era of the old symmetric key, i.e., before the rekeying started. But we haven't spoken yet about the blocks that rekeying allocated to replace them. The free tree has to create a new entry for them as well because they are not part of the current generation and therefore sparse aka reserved. Here, we run into trouble.
Let's illustrate this with an example: Rekeying has just rekeyed VBA 0 and thereby replaced a superseded inner node of the virtual block device, physical block 10, with physical block 20. Block 10 is still in the free tree but will be recognized as pseudo-reserved because it is marked with the old key ID. Assume that we were to mark the new entry of block 20 with the new key ID. Later, during the rekeying of VBA 1, block 20 must be replaced again. This time, the pseudo-reserved free-tree entry of block 20 will remain undetected because of the new key ID. Alright. So, let's go back to the rekeying of VBA 0 and use the old key ID for the free-tree entry instead. This won't work neither because now, the entry for block 20 is freed too soon, when the rekeying of VBA 0 is done.
We have to find another criterion for freeing such superseded blocks that were allocated by rekeying itself. Luckily, this is possible because we know that such a block is always replaced in the next VBA-rekeying step, given that the current rekeying VBA is not the last one covered by the corresponding node in the virtual block device. So, we can mark the free tree entry with the old key ID and the next VBA that is to be rekeyed. In the above example, this would cause block 20 to be freed again as soon as the rekeying of VBA 1 is complete, which is exactly what we want.
The only thing left is what happens when the current rekeying VBA is the last one covered by the VBD node. In this case, the block that is allocated for CoW will not become pseudo-reserved because it will contain the last version of the VBD node that is created by the running rekeying process. Its free-tree entry can therefore be a "commonly" reserved one with the new key ID.
One thing that the above description of rekeying leaves out is that different states of the CBE device might have different virtual address rages due to resizing. But this actually integrates well. The CBE knows the virtual address range for each snapshot. So, the VBA-rekeying procedure will be triggered for each VBA referenced by any device state. Then, during the VBA-rekeying, the CBE will simply skip device states that do not cover the current VBA.
The meta data of the CBE is not yet encrypted (except of the symmetric keys in the superblocks of course). Consequently, the rekeying process doesn't have to re-encrypt the meta-data neither. As soon as the CBE meta data is also stored encrypted on the physical block device, rekeying must be enhanced to re-encrypt not only the inner nodes of the virtual block device, but also those of the free tree and the meta tree. However, this isn't a complex task. The virtual block device is entirely re-written anyway. Only that then further decryption and encryprion steps would be added.
When rekeying the inner nodes of the free tree, the CoW-allocation algorithm at the meta tree would be the same as for other free-tree operations (unlike when rekeying the virtual block device). This is due to the fact that of the free tree meta data there are only two versions at a max and the older one is always freed when securing the superblock. Furthermore, the meta tree always contains enough blocks to make the entire free-tree meta-data volatile.
Rekeying the meta tree, finally, is almost the same as rekeying the free tree, only that CoW now allocates from the meta tree itself. More precisely, it allocates from the left-most descendant type 2 node of the inner node that shall be updated. The design of the meta tree ensures that this type 2 node will always provide the required amount of blocks.