... of GNU tools. Finally, the paper will describe performance testing results and talk about ... and tools specially tailored for embedded systems. In some cases ... After all, Linux is free, open-source, .... To do this it's a good idea to follow the start ...
Porting Linux to ERC-32 Architecture Sebasti´ an S. Prieto, Ignacio G. Tejedor and Aitor V. S´ anchez Computer Engineering Department (University of Alcal´a) Ctra. Madrid-Barcelona, km 31,600, 28871 Alcal´a de Henares (Madrid), Spain {ssp, ngt, avs}@aut.uah.es Abstract In this paper we will describe the work involved in porting a variant of the Linux kernel for MMUless CPUs (uCLinux) to the ERC-32 platform. ERC-32 is an ESA approved radiation-tolerant SPARC V7 processor developed for space applications, and its interest lies in the open nature of its design and specification in order to increase the availability of development tools, operating systems and application software, and therefore reduce cost of space missions. Making and open operating system like Linux available to this platform makes it even easier to develop embedded applications for ERC-32. This paper includes a description of the work involved in the realization of this Linux port, the customization possibilities of the final product and a description of the provided end-user development environment consisting of GNU tools. Finally, the paper will describe performance testing results and talk about future enhancements.
1
Introduction
In this scenario, it’s no wonder that at some point somebody thought about porting the phenomenally popular Linux operating system to embedded environments. After all, Linux is free, open-source, highly performing and customizable. Additionally, there is a plethora of reliable development tools freely available, and most importantly, massive community support. And development would be greatly accelerated, since the PC platform could be used as part of the development cycle. However, the standard Linux kernel has some important issues when placed in an embedded environment:
Traditionally, embedded systems were based in simple microcontrollers and hand-programming using assembly language or simple toolkits was an acceptable practice. However, the increase in performance and capabilities of processors and other system components, together with the flourishing of new and innovative applications for embedded systems have caused an increasing demand in operating systems and tools specially tailored for embedded systems. In some cases, stripped-down versions of commodity hardware and slightly customized commercial software can be used. But in most cases it is necessary to use software specifically designed to deal with the peculiarities of embedded platforms. Usually, embedded OSes share some common characteristics, such as: small memory footprint, RT capabilities, POSIX and/or proprietary APIs for IPCs and such, execute from ROM capability. Additionally, these OSes are bundled with development toolkits including debuggers, compilers and profilers. These are usually commercial operating systems with very high licensing fees. Examples of such OSes are VxWorks, QNX, PocketPC (formerly Windows CE), etc. However there are some free and opensource alternatives such as RTEMS that are gradually gaining popularity.
• Linux is not suitable for Real-Time task and embedded systems are frequently used in hardRT applications. • Linux requires a hardware architecture with MMU. Many embedded hardware platforms are MMU-less. There are ongoing efforts to achieve RT capabilities in the Linux kernel, and to adapt it to MMU-less platforms. We will focus in the second of these efforts, uCLinux [4]. uCLinux is a Linux version specially modified to run in MMU-less architectures. There are some tradeoffs, but most of the Linux functionalities are implemented. Up to version 2.4 of the Linux kernel, uCLinux was a separate project, but starting with 1
will cause a trap. The operating system must properly use this register and establish an appropriate trap handler for this event. The Trap Base Register (TBR). It contains a pointer to the base of the trap table, plus a field (named tt ) that indicates the trap number in case a trap is generated. The Program counter PC and the next Program counter nPC. The nPC register points to the next instruction to execute.
the upcoming 2.6 release, uCLinux will be integral part of the official Linux kernel source distribution. In our case, Linux cannot be ported to the ERC32 architecture since it is a MMU-less CPU, so uCLinux was the way to go.
2
The ERC-32 Architecture
The 32-bit Embedded Real-time Computing core (ERC32 [1]) is the result of the 32-bit Microprocessor and Computer Development programme of the European Space Agency (ESA), a programme specifically aimed at the development of hardware and software tools tailored to embedded space applications. With the premise of building a platform focused on high-performance, radiation hardness, error correction and handling, low power consumption and at the same time based in an open standard architecture without licensing issues, ERC32 is a processor based on a SPARC V7 RISC architecture. It consists in three chips; Integer and Floating Point Units, and a Memory Controller (MEC). However, it can be found nowadays integrated into a single chip solution. The ERC32 architecture includes all the necessary functionality to build a 32-bit embedded computer and it is only necessary to add external memory and I/O devices to easily build a fully functional system.
2.1
2.2
Tharsys SPARC RT SBC
The Tharsys SPARC RT SBC is a rack-mountable system board built around the ERC32 architecture, that includes the following features: • 4Mb RAM and 512Kb ROM. • VME interface. • Two serial RS-232 ports. • One centronics IEEE1284 (parallel) interface. • One ethernet port supporting IEEE 802.3 10BaseT interface. • Watchdog. • Various programmable timers, several external interrupt inputs.
Some Architectural Details
Since the ERC32 system is based in the SPARC V7 architecture, we will not make an exhaustive description of the microprocessor design and programming model, which is freely available for review. However we will mention some details related to the architecture-dependant parts of the uCLinux port. When implementing CPU registers, the SPARC architecture uses the well-known sliding-window approach. The dispatching routines of the operating system must take care of this aspect of the architecture, and trap handlers must be implemented to deal with window overflow and underflow events. There are several CPU registers that must be taken into account when writing the architecturedependant code. The Processor Status Register (PSR), that holds amongst other data: The processor interrupt level (PIL), the processor mode (S) , the previous processor mode (PS), the traps enable flag (ET) and the current window pointer (CWP). The Window Invalid Mask (WIM) register. This register holds eight useful bits related to the eight possible register windows in the system. A bit set to ”0” indicates the window is valid to use, and a bit set to ”1” indicates the next window is invalid, so accesses
This is the platform that has been used to help in the development of the uCLinux port.
3
Porting uCLinux to ERC-32
Porting uCLinux to a new architecture involves adding and modifying kernel source files with code that depends on internal peculiarities of the specific architecture (architecture dependent code). However, it is important to modify it as little as possible, thus it is necessary: 1. To store all the customized code in a new directory structure for the new architecture. In our particular case there is a directory named arch/sparcv7nommu for the architecture code and a directory named include/asm-sparcv7nommu for include files. 2. To develop the boot code that is necessary to leave the system in a state suitable for booting the kernel. This code is typically written in assembly language. 2
3. mem init, this function determines the total amount of available page frames in the system (physical memory).
3. To code all the functions needed for a correct kernel boot. To do this it’s a good idea to follow the start kernel function, since it is in charge of initializing the kernel and it calls most of the kernel initialization functions.
3.1
4. trap init, is a dummy function. 5. init IRQ, initializes the structures necessary for interrupt handling, setting all handlers to NULL. Eventually the request irq function will associate each interrupt to a specific handler.
The Port
The first step to successfully port uCLinux to a new architecture is describing all of its specific characteristics in a header file which in this case has been called erc32.h. This file defines the main elements of the architecture such as the memory map, the interrupt map, macros, etc. Moreover it must be structured in such a way that it can be included from both C and assembly language source files. This is possible due to the fact that C and assembly language conditional compilation directives have the same syntax. The first code to develop is that which correctly initializes the system hardware and leaves it in a correct state for the rest of the kernel loading process. The file that contains this code is crt0 ram.S. The code in this file initializes the trap table (located at the bottom of the main memory), sets the value of the PSR, WIM and TBR registers, initializes the stack pointer so that it points at the top of the main memory and configures the UART timing registers and RAM and ROM wait states. At this point we are ready to jump to the entry point to the kernel, the main function. We must keep in mind that we are porting uCLinux to an embedded system. In this kind of systems, unmaintained and self-checking capabilities are crucial. For this reason the first kernel routine implements a Build-In-Test (BIT) of the physical storage devices (RAM memory). The kernel boot process is performed afterwards. This task is carried out basically by the start kernel function. The following are the functions invoked by start kernel along with a short description of the operations they perform, in the order in which they appear:
6. time init, registers handlers for the General Purpose Timer and the Watchdog Timer. Once registered, it sets the General Purpose Timer out. 7. arch syms export, is a dummy function. 8. kernel thread, this is an assembly coded function. It calls the clone function and as a result the child kernel thread init task is created, an the parent process turns into the idle task. This last step cedes control to init task. This thread continues the kernel initialization process. In the general case it is in charge of initializing character and block devices, but in our particular case it just configures the UART device as part of the character devices configuration. For this operation we need to modify some architecture independent code, specifically the drivers/char/tty io.c file. All the functions mentioned in the previous section are indispensable for the kernel boot, but there are quite a lot of additional non-visible tasks that must be performed and are absolutely necessary for everything to work right. Amongst them are interrupt and trap handler inclusion, signal handling, definition of system call interfaces, task switching and memory management.
3.2
1. setup arch, this function registers a console which is used by printk to allow the kernel to print warning and debug messages, establishes the start and the end of the kernel code section and the address of the available memory which immediately follows the kernel code section. Finally, it sets the root device, although this is not very important in an embedded system.
Interrupt/Trap Handlers
The ERC-32 integer unit supports three types of traps: synchronous, Floating-Point/CoProcessor and asynchronous (also called interrupts). Synchronous traps are caused by hardware response to a particular instruction (i.e illegal instruction) or traps instruction (Ticc instructions, i.e ta 0x10). Floating-Point/CoProcessor traps caused by a Floating-Point-operate or CoProcessor-operate instruction occur before instruction is complete. Asynchronous traps traps occur when an external event interrupts the processor.
2. paging init, this code initializes all the page frame descriptors with the help of the free area init function. 3
Once a trap is set the following operations are performed: (1) further traps are disabled, this means that asynchronous traps are ignored and synchronous traps force an error mode; (2) CPU becomes in supervisor mode (setting S bit in the PSR) and the last CPU state is copied into the PS bit in PSR; (3) CWP is decremented by one (modulo number of windows); (4) PC and nPC are saved into the %l1 and %l2 register respectively; (5) tt field in the TBR register is set to the appropriate trap value; (6) the PC is written with the contents of the TBR value and nPC is written whit TBR+4. At this moment the trap handler is executed, and when it finished: the CWP is incremented by one (module number of windows) to re-active the previous window and the return address is calculated.
process, although it depends on the signal. The former case is more complicated because the signal handler is located in user space but must be executed using the kernel stack. The problem that arises is that when switching from user mode to kernel mode and back, both hardware contexts are purged, so do signal must be able to exchange user and kernel hardware contexts so that the handler code can be correctly executed. Figure 1 shows this process. KERNEL Mode
USER Mode
do_signal() setup_frame()
Normal Program Flow Signal Handler
system_call() sys_sigreturn() restore_sigcontext()
Return Code
3.3
Signals
Signals in Linux supply a mechanism to notify processes when asynchronous events occur in the system. Each event has it own name and an associated symbolic constant. Each signal is associated to a different event. The arrival of a signal triggers the execution of the signal handler the process has associated to that signal. The handler is not called immediately after the signal is received; in fact handlers are called after each system tick, just before returning from the timer interrupt handler. The kernel provides some system calls that allow processes to send signals and establish the handler for each signal they receive. If the process hasn’t registered any handler for that signal, a default handler is executed. Even though the concept of signal is fairly intuitive, its implementation in the kernel is quite complicated, due to the fact the kernel must perform operations as keeping what signals are blocking which processes, check for the arrival of new signals before every switch to kernel mode, determining what signals can be ignored or handling the signal in case the process has a handler for it. Many of these tasks are performed by the architecture-independent part of the kernel, so we will not elaborate on this subject[2].
FIGURE 1: Signal Catching
3.4
Syscall Interface
A user process execution thread can enter the kernel by invoking a software trap (ta 0x10 instruction). The software trap writes in the TBR register the trap number to obtain the address of the trap handler. The program counter is loaded so that execution continues at the handler code, which acknowledges the system call request and invokes the kernel function associated with it. Figure 2 shows the system call interception. User Process TBR
ta 0x10 Base address
tt
0000
linux_sparc_syscall Syscall handler
System call execution
FIGURE 2: Syscall Interface 3.3.1
Signal Management Dependent Part
3.5
The most important function to implement is do signal, which can be found in the arch/sparcv7nommu/kernel/signal.c file. This function is responsible for architecture dependent signal handling. The operations that this function does depend on the process having a handler registered for the specific signal or not. In the latter case the default action is executed, normally the termination of the
Process Switching
Multitasking in uCLinux is much easier than in Linux due to processes lacking pages tables. As a result there is no memory protection and we only have to store and restore the hardware context for the task. The part of the task switching that is dependent on the architecture is programmed in assembly language, due to the fact that it is important to control 4
the process at the register level and to keep the execution latency for this routine as short as possible in order for the system to be efficient. The actions to be performed when context switching are very dependent on the architecture, and in the case of ERC-32 are defined by the register windows in use, PSR, WIM, FPU registers (if used), task stack space, PC and nPC. Figure 3 shows an example of what could be a part of the context of a task, in terms of the register windows. Everything described above is performed by the sparc switch to function.
ERC-32 lacks virtual memory or translation mechanisms, if these gaps exist they exist also in physical memory and therefore they are wasted due to the fact embedded systems don’t have a lot of memory. Virtual Memory on Linux
Linux without Virtual Memory (uClinux)
Stack
256 MB Virtual Space
Heap
Stack Heap
Data Section
Data Section
Code Section
Heap pages assign using MMAP
Fix Size Stack
No Virtual Spaces
GAP
0
Global Registers GAP
7
Code Section
6 5 4
FIGURE 4: Memory Management
Inside task context 3
Out of task context
3.6.2
2
The stack is a portion of the memory available to a process that is dynamically allocated and deallocated in a predefined fashion. It is normally used for parameter passing, but not in ERC-32 which passes parameters in local registers. In systems equipped with a MMU the stack could grow dynamically, helped by the presence of virtual memory system, but in ERC-32 this is not possible. When a process is loaded into memory it is assigned a static stack. For this reason we must design our processes so that their stack never overflows, but at the same time without using an excessive of memory assigned to the stack, since it would be wasted memory.
1
FIGURE 3: Task Context
3.6
Memory Management
The fact that ERC-32 is a MMU-less architecture makes memory management particularly simple, as physical and virtual address spaces are identical, and memory protection that would make possible the implementation of the copy on write [3] mechanisms is missing. 3.6.1
ERC-32 Stack
Dynamic Memory Allocation
4
Dynamic memory available to a process is known as heap. Allocation and deallocation of memory in this area is carried out by calling malloc. In a virtual memory system the sbrk function is used by a process to dynamically change the amount of space allocated for the heap. The ERC-32 architecture has a flat memory model so it is not possible to use this function to do this and mmap is used instead. Figure 4 shows how a system with virtual memory and memory translations mechanisms allows the existence of memory gaps between sections. Since
Cross Development Kit
A Linux system has been installed in a laptop computer to act as a cross development and debugging machine for users of ERC-32 target. The machine runs under Linux RedHat 9 and provides the next utilities: • Cross-compiler from the latest binutils-2.13 / gcc-2.95.3 sources. • Cross-debugger from the latest insight-5.2 source. 5
6
• The SIS[9] sparc simulator.
• Two tools useful for uCLinux compilation such as: genromfs to create a ROM file system image and elf2flt to convert ELF to FLAT binary format.
In this paper we have described the port of the uCLinux system to the ERC-32 architecture. The version ported is Linux 2.0.39 and we choose it because of stability issues. Nowadays, all system calls are included in the ERC-32 kernel and this kernel its fully functional. Now we want to give the kernel Real Time enhancements, porting RT-Linux[7] or KURT[8] to ERC-32 architecture. At the moment, the official Linux kernel development team is nearing the completion of the v2.6 kernel. This distribution contains uCLinux project into it, so we are working to include the ERC-32 kernel into the Linux 2.6 kernel source tree.
• SPARC-based started code (ctr0) useful for uCLinux userland applications. • A reverse engineering, documentation and metrics tools for C code named understand.
5
Benchmarks
The target platform we are working with has the following main characteristics: • TEMIC ERC-32 SPARC CPU chipset.
References
• Software controlled or user clock socket (FUser MHz). Available frequencies (MHz): 4, 8, 16, FUser/8, FUser/4, FUser/2.
[1] ESA Contractor Report, SAAB Ericcson Space, Sweden, 32-bit microprocessor and computer development programme - Final report
• 4MB onboard 25 ns Static RAM, parity and EDAC protection.
[2] Daniel P.Bovet & Marco Cesati, 2000, Understanding the Linux Kernel, O’Reilly, ISBN.
For code coverage we use a benchmarking suite for UNIX, hbench[6]. Benchmark lat syscall sigaction lat syscall gettimeofday lat syscall write lat syscall getpid lat syscall sbrk lat syscall getrusage lat proc null static lat proc null dynamic lat proc simple static lat proc simple dynamic bw bzero bw mem rd bw mem wr
Conclusion and Future Work
[3] Fitzgerald, R. and Rashid, R.F., May 1986, The Integration of Virtual Memory Management and Interprocess Communication Accent, ACM Transactions on Computer Systems 4,2, 147-177.
value 6,6600 usec 4,7800 usec 4,8700 usec 3,7600 usec 3,9400 usec 22,8000 usec 551,000 usec 551,000 usec 975,000 usec 551,000 usec 10,2546 MBps 113,4409 MBps 140,4844 MBps
[4] Embedded Linux microcontroller project. http://www.uclinux.org [5] Sparc ELF Tools. http://www.uclinux.org/pub/uClinux/sparc-elftools [6] Aaron B. Brown and Margo I. Seltzer, Operating System Benchmarking in the Wake of LmBench: A Case Study of the Performance of NetBSD on the Intel x86 Architecture, Harvard University.
TABLE 1: Benchmark Results We measured the bandwidth attainable when: writing to memory when the libc routines bzero (or memset ) are used; raw read/writing from memory when it is accessed through an unrolled loop of memory accesses using array-offset addressing; transferring data through a pipe between two processes. We measured too the latency of: creating several types of processes using vfork and vfork + exec; in both cases installing a new signal handler and actually handling raised signals; and finally several different system calls.
[7] FSMLABS, the RT-Linux Company. http://www.rtlinux.org. [8] KURT: The KU Real Time Linux. http://www.ittc.ku.edu/kurt. [9] ERC-32 free software. http://www.estec.esa.nl/wsmwww/ erc32/freesoft.html.
6