Exploring the ARMv8 system level - Part 3
Within the last weeks I collected some experiences with enabling Genode's own hw kernel on ARMv8 hardware platforms, and like to share it via genodians.org. In the first post I've described how to first create a non-functional dummy system image. In part 2, I wanted to show how easily you can develop and debug early system level startup code when using QEMU. This time I'll center on the exception level changes, and the flexible page-table layout in ARMv8.
The ARMv8 ISA is compatible with the 32-bit ARM architecture. However, if concentrating on the 64-bit mode in ARMv8 only, things are more clean and clear than before. Now there are only exception levels (EL) 0-3, whereby EL0 and EL1 co-exist in the secure and non-secure world (keyword: TrustZone). Formerly in ARMv7 - with security and virtualization extensions available - there were additional five exception modes in the privileged system mode (now: EL1). This is depicted in the following image:
Of course, the complexity is still present in the hardware because the 32-bit compatibility mode does include all those UND, ABT, FIQ, IRQ, SVC, SYS sytem levels. But when considering a 64-bit-only operating system as we do, one can ignore it, and care about EL 0-3 only. That also means, there is no need to save/restore banked registers of these modes any longer when implementing a world-switch for TrustZone or hardware assisted virtualization.
As already mentioned in my last post, access to system-control registers can be done via human-readable names now. Most of these system registers use the corresponding exception level as appendix, like ELR_EL1 or ELR_EL2. The nice thing here is that ARM took the opportunity to smooth some register names and make them consistent across different privilege respectively exception levels. Most prominent example here is ESR_ELx. The so called Exception Syndrom Register was introduced in ARMv7 for the virtualization extension. It provides one common way to distinguish all kinds of exception reasons, like page-faults, alignment errors, denied co-processor access, and so on. Now, the register is available for EL1-3 and has a common layout, which is good.
Sadly, ARM did not took the chance to incorporate more exception information into that register, e.g. by lifting it from 32-bit to 64-bit width. Therefore, there are still different offsets in the exception vector left that needs to be distinguished, e.g., whether an exception was an interrupt or a synchronous exception. In our view, it would have been better to have a single point of exception entry assembler code, which just saves common registers, and allows decision making based on the ESR register in higher level code only. That said, in general I appreciate the path ARM went with regard to system registers in ARMv8.
When starting with the kernel work on QEMU/Rpi3, I recognized that the machine is started in EL3 (former monitor mode). To prepare the kernel in bootstrap, I had to prepare its page-tables and switch on the MMU, and therefore first leave EL3 to EL1. An exception return is typically done by filling the exception level Saved Program Status Register (SPSR) and Link Register (LR) and finally call eret. This instruction will atomically restore the program status and jump to the address in the link register. If one assigns a lower exception level into the program status register, one automatically drops privileges. When doing so the first time to leave EL3 to EL1, I always landed in the following state, shown by the QEMU monitor:
(qemu) info registers PC=0000000000000200 X00=0000000000804060 X01=000000000082f1d8 X02=0000000000000005 X03=0000000000000000 X04=0000000000000000 X05=0000000000000000 X06=0000000000000000 X07=0000000000000000 X08=0000000000000000 X09=0000000000000000 X10=0000000000000000 X11=0000000000000000 X12=0000000000000000 X13=0000000000000000 X14=0000000000000000 X15=0000000000000000 X16=0000000000000000 X17=0000000000000000 X18=0000000000000000 X19=0000000000000000 X20=0000000000000000 X21=0000000000000000 X22=0000000000000000 X23=0000000000000000 X24=0000000000000000 X25=0000000000000000 X26=0000000000000000 X27=0000000000000000 X28=0000000000000000 X29=0000000000000000 X30=0000000000800050 SP=0000000000000000 PSTATE=000003cd ---- EL3h FPCR=00000000 FPSR=00000000
I could see no change in the exception level, but the program counter was somewhere near zero. Interestingly, the RAM of the Raspberry Pi 3 starts at zero, and if there is no exception vector address set already, I assumed it would be zero as well. I intuitively assumed that an exception got raised. So, I decided to setup a very simple exception vector to have a look for the reason. This is how its done:
... ldr x0, =_exception_entry /* load exception vector address */ msr vbar_el3, x0 ... .p2align 12 /* align to page-size */ _exception_entry: .rept 512 /* fill whole exception vector with nops */ nop .endr mrs x0, esr_el3 /* finally, copy over the exception syndrom */ 1: b 1b
When looking at the content of the ESR register, the reference manual told me: "Unknown reason" :-(.
After a long search for the cause, I finally learned that there is a new bitfield in the Secure Control Register (SCR), and the Hypervisor Control Register (HCR), used to determine whether code in lower privilege levels is executed in 32-bit or 64-bit mode. The default value switched the machine to 32-bit.
After solving this little puzzle, further privilege level changes worked like expected.
The virtual-memory-related degree of freedom is huge in ARMv8. There were already distinguished page-tables for secure monitor, hypervisor, kernel/user land in either normal world and secure world before. There was a fixed size 2-level page-table in ARMv7 and a 3-level extended edition when using the Large Physical Address Extension (LPAE) introducing another page-table set for guest-physical to host-physical memory.
Now, all of these page-table sets of course still exist. But there are more adjustments possible. First, one has to distinguish in between the granularity of pages in size of 4KB, 16KB, and 64KB. Not all granularities are necessarily supported by all SoCs. One can configure the maximum physical memory size addressable, and thereby effectively define how much page-table levels to use (up to four).
Actually, there is no good reason to lower the granularity of pages in Genode. The system is optimized for the granularity of 4 KiB pages like it is available on other architectures as well. Because of the limited amount of memory available in the embedded ARM boards we address, I decided to stay compatible with the already existent ARMv7 LPAE implementation in Genode. I merely needed to widen the bit fields for the physical address and for the next page-table pointer in the page-table descriptors a bit.
After that, bootstrap finished its work and the kernel was booting until it tried to switch to another thread.