SPARK as an extremum: Components in pure SPARK
Some time ago Norman Feske wrote about C++ and SPARK as a continuum. His article shows how SPARK can be used to implement critical functionality in SPARK while implementing the core component in C++ on Genode. He further describes a set of rules to prevent unexpected side effects in SPARK. In short it boils down to stateless libraries written in SPARK that cannot interact with the environment but can consume and produce data. While this might sound restricted at first glance this approach can be used to strengthen the security of existing software such as parsers or protocol checkers that shield the C++ code from tainted data. We also applied this method successfully outside of Genode by replacing the C++ parser of the Fizz TLS library by our own SPARK implementation.
But what if you want to go beyond stateless parts written in SPARK and do a full dataflow analysis and proof over the whole component including its interaction with the platform? For this case I want to present our approach of pure SPARK components that can be created on Genode without needing to touch even a single line of C++. This frees the user from recreating interfaces to C++ every time and helps to avoid unmet preconditions for non-SPARK code. While the general idea was clear the way to its current implementation led over some seemingly insurmountable obstacles.
One critical requirement for this project were Ada bindings for C++ interfaces. Ada itself provides great mechanisms to bind to C++ so taking a C++ class and creating a binding for it is fairly easy. Yet Genode obviously consists of far more than a single C++ class. While its API is manageable creating bindings for all its interfaces is a tedious and repetitive task, one that could be automated. As this insight isn't new GCC comes with an integrated Ada binding generator, problem solved.
Well, not really. While GCC takes care of symbol mangling and memory layouts the code it produces is nothing one would want to work with in terms of an API for building software. Furthermore Genode makes extensive use of advanced C++ features, especially templates including their specialization mechanisms. As templates generate code on use they cannot be bound to Ada as the code one would bind against only exists if the same template instance exists somewhere in C++. GCC solves this problem by simply not supporting templates in its generator at all. Furthermore Ada specifiers have much stricter naming rules than C++ regarding the use and order of symbols. While C++ functions colliding with Ada keywords are correctly escaped by GCC those that start with an underscore, which is invalid in Ada, are simply omitted. Last but not least some constructs in Genodes C++ caused the generator to create code that wouldn't compile. Generating code with this tool would require much manual work to fix the issues while still leading to an unsatisfactory result.
But we thought we can do better and started creating our own binding generator - Cappulada. It aimed to support all features we need to automatically bind Genode C++ to Ada while maintaining the semantic structure of the API. What was A::B::C in C++ should become A.B.C in Ada. Pointers and references should only become access types where absolutely necessary since the API should be as SPARK compatible as possible. And using C++ template instances in Ada is a required feature as without them large parts of the API won't be accessible.
We chose to use Python for this tool as the libclang is already available and makes parsing C++ quite easy and the development productivity is realatively high. We added all the features we thought we would need, namespaces, classes, types and so on. Unfortunately the libclang API we used was on a level that did not provide mangled symbols so we created our own implementation of the name mangling. Also templates were only available as template definitions and their instances only consisted of the used template parameters. But to bind them to Ada their linker symbols and memory representation are required which only appear in the actual instance. So our own C++ template engine written in Python emerged.
As this is a quite complex project we used tests from the start on to keep its functionality monitored. So each added feature got its own test and some of them were tested with multiple features combined. This was done first for the code generation from the intermediate representation, then for parsing the C++ code into the intermediate representation. At the end integration tests were added that went from the C++ code and checked it against the Ada code. And finally validation tests that included compiling the Ada code and running it to check if it yields the correct results. After multiple hundred tests we thought we would be ready for Genode.
The first run on a Genode header made the parser crash immediately. We noticed that we didn't support a particular combination of typedefs of templates yet so we added this. Then again forward declarations were something we didn't even thought about. And function pointers. And function pointers to class member functions. And partial template specializations. And full template specializations. And function templates. Finding semantic and technical mappings for all these C++ features in Ada wasn't always an easy task. And with each single feature we supported a new one emerged. And if we supported enough features a combination of some would result in behaviour we didn't expect. At the end we had a tool in a similar but different half functioning state as the GCC binding generator but we came to the insight why this tool was in this state.
With the realization that generated bindings are not feasible and both binding and API need to be created by hand previous API limitations such as functions that are not allowed in SPARK could be removed. This API should not resemble any characteristics of any language or platform that implements it. The goal was to create a pure SPARK API for asynchronous verified components. The result is the Componolit Ada Interface, an interface collection that provides component startup, shutdown and platform interaction.
The CAI is a collection of so called interfaces. An interface is an abstraction of underlying platform functionality. The Log interface for example provides a way to pass log messages which will then for example print or store them. An interface can provide functionality from the platform to the component but also from the component to the platform.
Subprograms provided by the platform are simply package functions or procedures while subprograms provided by the component are formal parameters of generic packages. The first interface of any component is the Cai.Component interface. It takes a Construct and Destruct procedure as formal generic arguments that are called on component startup and shutdown repectively and provides a Vacate procedure that tells the platform that the component wants to be destructed. A simple component that only exits with a success status code would look as follows:
with Cai.Types; with Cai.Component; package Component with SPARK_Mode is procedure Construct (Cap : Cai.Types.Capability); procedure Destruct; package My_Component is new Cai.Component (Construct, Destruct); end Component;
The spec is unspectacular as it only declares the two procedures and instantiates the component package. A speciality of this package is that it can only be instantiated once while all other generic packages can have an arbitrary number of instances. The Construct procedure also gets a capability object passed that can be used to initialize other interfaces. It is only passed to this method and without this capability no platform interaction can happen. The body of our component just tells the platform in the construct procedure that it wants to exit successfully:
with Cai.Types; package body Component with SPARK_Mode is procedure Construct (Cap : Cai.Types.Capability) is begin My_Component.Vacate (Cap, My_Component.Success); end Construct; procedure Destruct is begin null; end Destruct; end Component;
As shown telling the platform the desire to exit, which is a platform interaction also requires the use of the capability. Let's take our component to the next level of programming tutorials, the hello world. To do this the Log interface needs to be added to the body:
with Cai.Log; with Cai.Log.Client; package body Component with SPARK_Mode is Log : Cai.Log.Client_Session := Cai.Log.Client.Create; procedure Construct (Cap : Cai.Types.Capability) is begin Cai.Log.Client.Initialize (Log, Cap); if Cai.Log.Client.Initialized (Log) then Cai.Log.Client.Info (Log, "Hello World!"); My_Component.Vacate (Cap, My_Component.Success); else My_Component.Vacate (Cap, My_Component.Failure); end if; end Construct; procedure Destruct is begin if Cai.Log.Client.Initialized (Log) then Cai.Log.Client.Finalize (Log); end if; end Destruct; end Component;
Interfaces are called sessions. They also can provide client and server sessions. The Log client connects to the platform and sends its log messages there. A hypothetical Log server would receive those messages if they're not handled by the platform.
The Client_Session is a log session object that resides in the component package. As it denotes platform state it is a limited type which cannot be copied in Ada. As a session object is a reference to a platform state changing its state changes the platform. Yet if copies would be allowed the change on one copy would not be reflected to the original and therefore creating inconsistent system state abstractions. As it is a limited type the data flow initialization must be done by a function which in this case is the Create function. Before the session object can be used it needs to be initialized on the platform. This is only allowed if the capability is supplied. As there is no guarantee that the platform actually initialized the object we still need to check its status before use. If the initialization worked a log message is printed and the component sends a success exit to the platform, if not a failure is sent. The platform, being notified about the desired exit, will at some point call the Destruct procedure which can be used to close all remaining sessions. In this case the log session is finalized if it is still active.
When proving the component with gnatprove the following message will be printed:
component.adb:8: medium: precondition might fail, cannot prove not Initialized (Log)
To prevent resource leaks on the platform a session cannot be initialized twice without being finalized inbetween. Calling Initialize on an already initialized object would make the already allocated resources potentially inaccessible by pointing the reference to a new object. To prevent this Initialize comes with a precondition:
package Cai.Log.Client with SPARK_Mode is procedure Initialize (Client : in out Client_Session; Cap : Cai.Types.Capability) with Pre => not Initialized (Client); ... end Cai.Log.Client;
There are two ways to fix this. Either assume that the Log session is never initialized when Construct is called with a precondition:
procedure Construct (Cap : Cai.Types.Capability) with Pre => not Cai.Log.Client.Initialized (Log);
To do this the Log variable that resides in the package body must be moved to the package spec to be visible at the spec of Construct. Or check that Log only get initialized if it is not:
procedure Contruct (Cap : Capability) is begin if not Cai.Log.Client.Initialized (Log) then Cai.Log.Client.Initialize (Log); end if; ... end Construct;
This implementation provides a pure SPARK environment that allows full interaction with the platform. Platform states and subprogram behaviour is expressed by function contracts and proof functions so that components can prove functional properties over the use of platform interfaces. Although this proof information isn't complete yet it already provides a powerful tool to write complex but robust components.
The high quality of the interface comes at the cost of a long and tedious design process. To provide an interface in SPARK its functionality must be expressed in a usable way without violating SPARK restrictions and its behaviour needs to be defined with function contracts. As this needs to be done for each interface by hand there is currently only a small set of interfaces. Those include a simple logging interface and a timer that provides a monotonic system clock. But also a block interface to interact with block devices such as hard disks and a configuration interface that enables the platform to pass external values to the component. On the other hand any platform integration such as setting up signal handlers and callback functions is done by the library providing a high level of abstraction from the platform. This allows porting to multiple different platforms which has been done for Linux. This enables complex software to run unchanged on multiple platforms. As an example we implemented a block device tester that writes and reads random data in a random order, hashes over it and then checks if both hashes are correct. It runs unmodified on both Genode and Linux. Support for components running on the Muen separation kernel natively is also planned.