FLoW: Achieve Native Multi Threading Support for Embedded System

2015 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC)

FLoW: Achieve Native Multi Threading Support for Embedded System Through Microkernel Adhe Widianjaya†, Tito Pramudana†, Dadet Pramadihanto†, Sritrusta Sukaridhoto*, Achmad S. Khalilullah* †

Department of Information and Computer Engineering * Department of Creative Multimedia Electronic Engineering Polytechnic Institute of Surabaya Surabaya, Indonesia {adhe, tito}@ce.student.pens.ac.id, {dadet, dhoto, subhankh}@pens.ac.id

were included in microkernel implementation. While device driver could be freely implemented by user. Monolithic structure of kernel was actually large and inflexible. It also had limitation to covers application implemented in lower embedded platform with less memory and features. Design principles of microkernels could help us to establish threading and synchronization as minimum methods of resource management.

Abstract—Multithreading implementation through real time operating system was held in several years to optimize resource management in embedded system. Several side effects, such as large binary size and memory usage were appeared while implementing multithreading mechanism and its supporting mechanism. We developed FLoW to reduce those side effects while performing multithread in embedded system. FLoW implemented event based microkernel to minimize the environment and simplified the services. Static array implementation with reduced priority number was chosen rather than dynamic allocated to reduce platform memory usage in maintained scheduling performance in FLoW. The results show average performance of native multithreading supported by microkernel relative to existing real time operating system with lower side effects.

The goal of this work was to inspect and analyze the involvement of microkernel as main idea of resource management implementation. We intended to achieve multithreading support for embedded system rather than using threading library. We introduced FLoW with the main concept of embedding microkernel in hardware platform as thin and small size firmware. User application might be assumed as both bare-metal or library-equipped program and treat FLoW as common hardware-native features, like timer, interrupts, etc. From this side of view, we extended the ability of target hardware through FLoW environment. The structure of FLoW was developed to be simple and thus small although there were several services which were brought from general operating system through the use of microkernel.

Keywords—embedded system, multithreading, microkernel, real time operating system

I. INTRODUCTION Real time operating system for embedded application had been implemented in several years to optimize resource management function. Threading support and program synchronization as minimum method could be implemented at least for every embedded system application [1]. Through multi-threading techniques, utilization of I/O port and platform key features might become more effective and efficient

The rest of this paper is organized as follows: Section II reviews about related works in microkernel and embedded environment implemented real time operating system. We described our motivation in Section III. Section IV presents FLoW system architecture. While, implementation results are shown in Section V with several discussions includes affected part in memory usage and interrupt latency. Section VI concludes this paper with some future works.

Today’s embedded system face a challenge to optimize hardware software interaction. While in other side, the growth of embedded device platform had been very fast and significant to break the limitation of embedded system ability [2]. The development of SoC and microcontrollers produced high performance and reliable embedded platform with so many features, includes operating system compatibility. This condition brought us opportunity to develop next technology of embedded resource management.

II. RELATED WORKS A. Microkernel The development of microkernel had been held more than twenty years ago. Mach and other microkernel before the years of 90s was denoted as first generation of microkernel. They suffered with poor performance of internal kernel mechanism and slow communication. In earlier 90s, Jochen Lietdke introduced optimized L3 microkernel as a former of popular L4 microkernel [3]. Microkernel should be small and efficient. In multi address space system, communication in microkernel

Our ideas was to restructuring system level design of embedded software and minimizing the environment to reduce side effects in memory usage and performance degradation. Embedded environment with the microkernel concept fulfills these requirements. We didn’t need to implements operating system completely. In several implementations and discussions [3][4], minimal operating system mechanism and concept such as thread scheduling, communication, and synchronization

978-1-4799-8975-1/15/$31.00 ©2015 IEEE

36


Bare Metal Apps

OS Apps

Real Time Apps Threading Library

Peripheral & Threading Library

Peripheral Library

Linux Kernel and Drivers

Microkernel

Hardware

Hardware

(a)

(b)

Hardware (c)

Figure 1. Global structure of (a) threading library based RTOS, (b) FLoW and (c) Linux based RTOS\

needed to be fast and became the focus of development. Device driver was implemented in user level as application rather than services [4][5].

larger and affecting system memory usage. Real time operating system as threading libraries were small and more straight forward to the hardware but not scalable. While microkernel provided small binary size with minimal implementation of operating system services. Therefore, the use of microkernel in order to achieve multithreading was promising.

In the next years, L4 microkernel variants [4][5][6] for several applications were developed and achieved different advantages than monolithic kernel especially in binary size. L4 microkernel evolved and included security features by using capability concept introduced by EROS [5] for virtualization environment [6]. Although the application of microkernel became very large, the principles of minimality were maintained. The use of microkernel was evaluated as potential for scalable computing system.

Microkernel took multithreading as one of services which was worth to be maintained inside the kernel. By embedding small-thin microkernel with scheduling and synchronization features in hardware, we could achieve embedded environment with native multithreading support for user applications with reduced side effects.

B. Multithreading Environment and Real Time Operating System In order to meet the requirement of real time system, the development of RTOS was focused on multithreading and scheduling optimization. There were real time version of popular OS kernels. Real time Linux and QNX [7] were the popular OS kernels which was restructured for real time system. Free and easy interface were offered as advantages of using Linux environment in embedded system. Memory footprint still became main problem and degraded the system performance.

IV. SYSTEM ARCHITECTURE A. FLoW Embedded Microkernel FLoW implemented microkernel on top of hardware. FLoW utilized CPU and several resources to achieve native threading support for user application. As a core component, microkernel intended to run after hardware boot loader and followed by user embedded application. Therefore, FLoW environment was initialized by microkernel at device boot.

The development of real time operating system as threading library was introduced. FreeRTOS [8], especially for small microcontrollers, solve those problems by providing only routines for directly managing the resource and optimized scheduling algorithm. Threading libraries as main services of embedded environment were powerful and straight-forward. Multithreading through this method was fast. But they didn’t support more complex embedded environment, which usually implements multi address space for multiple task [1].

User Application Thread #2

Thread #1

Micro Call Handler

Scheduler

Interrupt Wrapper

Microkernel

III. MOTIVATION We found that there were several method to achieve multithreading support for embedded system. The use of monolithic kernel based operating system, such as Linux, provided the ease of use in complete environment with featured services. As shown in figure 1, Linux based RTOS had drivers and also other operating system component inside the kernel. But consequently, as described in section II, the size became

978-1-4799-8975-1/15/$31.00 ©2015 IEEE

CPU Timer

I/O

Fig. 2. Architecture of FLoW with Microkernel

37


Microkernel consisted of scheduler, interrupt wrapper and micro call handler as shown in Figure 2. Those three components were formed by simple routiness which resided in machine interrupt request handler. The desiign was simplified to gain efficient and fast environment funcction to serve user application. Microkernel formed thin layer of system abstraction which served user application as hardware extensiion with threading support. Figure 2 shows the relation beetween embedded hardware, microkernel, and user application. The current figure also shows the comparison between our propposed system with operating system based embedded system m and bare metal application. Real time operating system used u in embedded system had been given many advantages to user. u It gained high level resource management through serrvices and other mechanism. But in several case, it still suuffered by lots of memory footprint which ended as main disadvantages of operating system in embedded system. Therrefore this kind of environment was not suitable for middle low embedded platform which consisted of low size mem mory. While bare metal application, though saved memory annd directly utilize the hardware, still had lack of function in maanaging resources. Our proposed system gave new option of embedded system m application environment through the use of microkernel.

Fig. 3. Role of active and expired run queue in Linux O(1) scheduler with 80 available thhread priority

Multithreading in FLoW were implemented along with preemptive priority driven rouund robin algorithm. A set of queue of priority was set in memory m to accommodate various priority implementation of thhread. Microkernel would took next scheduling context from queue in simple way through iteration index. This method waas similar to O(1) scheduler [12] implemented in Linux Kernel Scheduler S in early 2.6 version. In early development, we providee two kind of run queue, active and expired. Figure 3 illusttrates how thread switch was performed through those runn queue. When there is no scheduling context componeent in run queue anymore, scheduler would swap the expirred with the active run queue.

Our microkernel used similar mechanism m implemented in L4 microkernel [3][9]. For minimizing thee binary size, we implemented event based kernel routines. Microkernel only provided service when there were available request form user, S wouldn’t and would be at standby state all the time. Scheduler take kernel routines to be scheduled and wassted the CPU time. By this method, we managed the hardwaree to focus on user activity and assume the microkernel as exxpanded hardware resource. User application then could ask miicrokernel services through micro call.

Algorithm 1 Scheduler Handler while qactive[priority].context == null do swap(qactive[priority], qexpired[priority]) priority ← priority + 1 if priority >= MAX_PRIORITY then priority ← prority y – MAX_PRIORITY new_context ← qactive[prio o].context quantum ← new_context.quan ntum qactive[prio]remove(current) qexpired[prio]insert(current) elapsed ← elapsed + quantu um return new_context

LoW for internal Some resources was reserved by FL mechanism. FLoW environment hidden platform specific interrupts from user application. User was forbidden to gain raw access to interrupt because microkerneel used it. Primary timer and interrupt vector would also be forbbidden for user. In order to provide similar function to reservedd resources, FLoW was equipped with runtime counting regiister to count the elapsed time. This register contained lonng integer which represented the time in microsecond format. Counting register a to user. resided in microkernel memory space and accessible While interrupt was replaced by synchronizattion semaphore [6] for each interrupt number. Semaphore was denoted as protected kernel objects, and directly managed by microkernel through Interrupt Wrapper.

e for user to control FLoW provided limitless environment thread scheduling as part of resources r in order to keep the principles of microkernel to bee as small as possible and policy free. Although scheduler wouuld be the biggest part of the environment, microkernel woulld be maintained in effective and small size. We found that O(1)) scheduler could was easy to be implemented through static arraay rather than dynamic allocated run queue as described in allgorithm 1, Scheduler Handler. Although this method provide low l memory usage, microkernel had to perform iteration thrrough array and could reduce scheduling performance. Thereefore we reduced the array size by reducing MAX_PRIORITY num mber of thread priority to 5.

B. Scheduler FLoW implemented free policy priority driven d round robin algorithm. User had the rights for deffining scheduling parameters, such as priority, time quantuum in scheduling component. Microkernel scheduler only looked at higher priority scheduling component and put the t corresponding threads to be given to CPU until it blocked by some reasons. C executed the While time quantum defined how long CPU thread.

978-1-4799-8975-1/15/$31.00 ©2015 IEEE

C. Interrupt Wrapper i architecture specific interrupt Interrupt wrapper resided in vector/ Interrupt wrapper utilizeed semaphores for each interrupt to replace user direct access to t interrupts. Microkernel event based routines avoided to be bloocked as much as possible. In order to lock the current interrupt for interrupt handling, micro call was used for user thread bind to a semaphore to

38


bring down the interrupt semaphore. Micro call evaluates interrupt semaphore and decide whether user thread can take over the interrupt or not. Micro call puts user thread in blocked state if the interrupt resource was previously brought down by internal microkernel. If some reason trigger the hardware, microkernel got the access the handle the interrupt first then brought the semaphore up to release the blocked thread as described in algorithm 2.

The whole memory space was visible to user unlike the operating system based embedded system. User application and FLoW kernel routines resided in same visible memory. Therefore we had allocated special space in different way to keep regular user program memory space from our microkernel place. We assume that user program would grow up than down, so the microkernel would reserve space before user applications as shown on Figure 4.

Algorithm 2 Interrupt Wrapper context_save(context) interrupt_id ← read_id() if interrupt_id == SCHEDULER_INTERRUPT then context = Scheduler_handler() else non_blocked_semaphore_up(interrupt_id) context_load(context)

Some embedded CPU [10][11] implemented supervisor call to facilitate kernel system call handler in regular operating system. FLoW also used this kind of mechanism to provide micro call environment interfaces for user application. User might place the arguments in architecture specific registers and begin the micro call by executing architecture specific instructions (e.g. syscall in ARM architecture). In other architecture, supervisor call wasn’t available. FLoW utilize regular interrupt to accommodate micro call. In micro call, CPU execution was given to microkernel micro call handler in supervisor call vector interrupt. After hardware boot, FLoW reserved interrupt vector for this mechanism.

Synchronization semaphore separated each interrupt in different object. Although it was flexible, the implementation have to be careful so that it would led the system to deadlock. FLoW allowed only one thread binding to one interrupt semaphore. This method cuts long wait time for both kernel and user because there will be only two threads which possess one semaphore. If the other thread wanted to take the semaphore, FLoW would return fail status. Semaphore which was currently brought down by thread, then new interrupt was occurred, microkernel would do nothing until the thread brought the semaphore up. FLoW realized this condition through non blocked semaphore up function to avoid kernel to be blocked.

Micro call Enter User Thread

Microkernel Micro Call

D. Micro Call for Threading API User application was equipped with threading support by through Micro Call microkernel services API. Threading mechanism was simple and fully managed by microkernel scheduler. FLoW threads were treated as components and realized through object creation. Thread component consisted of separated stack and context saving storage. FLoW would reserve some memory space for those two things for each created thread. The use of capability and component based system inspired the creation of component based threading support. Each kernel object (e.g. thread) could be addressed by user through ID number. Produced object had its own ID and returned from micro call after micro call exit. The same ID number could be used in object revocation or control (e.g. Semaphore Up and Down operation).

Fig. 5. Micro Call

As shown in figure 5, micro calls were realized through application binary interfaces and were handled by microkernel event. As shown in figure 4, user thread put suitable parameters in architecture specific registers and called the svc instruction like system call in general purpose operating system. This method was similar to secure monitor call implemented by vendor specific firmware for ARM platform. Therefore micro calls and FLoW services were natively fused in hardware platform.

V. RESULTS AND DISCUSSIONS FLoW were implemented and evaluated in Pandaboard ES with 32 bit ARM Cortex-A 1.2 GHz processor. We established single address space environment running in single core to observe multithreading performance.

Stack + Context Storage Tn

A. Threading Performance A single thread were created with various priority test. We implemented thread priority from 0 to 79. We then examined time consumed by context switch and context lookup in static array based run queue compared to dynamic allocated run queue. The result was shown in figure 6.

Stack + Context Storage T1 RAM Flash / RAM

Stack + Context Storage T0 Heap Program (TEXT Section)

Fastest context switching performance was achieved in priority number 0 which was about 4 microseconds. More thread priority implemented caused more context switching time. The scheduling algorithm which was consisted with

Microkernel Fig. 4. FLoW memory map of thread stack and context storage

978-1-4799-8975-1/15/$31.00 ©2015 IEEE

Micro call Exit Returned Object ID

39


scheduling context searching activity thrrough the queue produced incremental time. Therefore secondd result which used dynamic allocation in creating run quueue was better. Consequently, average performance of dyynamic allocation algorithm for run queue produced an oveerhead about one microseconds than fastest context switchinng performance in static array based run queue algorithm.

include idle thread for applicaations. User might place a free routines manually. We considered this method as user right for gain flexibility. B. Binary and Code Size We then analyze the impactt of microkernel implementation and multithreading in memoryy usage. Final binary size was about 30 with 3.5 thousands Line of Codes. We then compare this result with several version of microkernel and other RTOS. As shown in figure 7, FLoW was relatively small. Equipped with same function and ability,, FLoW maintained thin layer of computing base with multithhreading support in embedded system. 400 350 300 250 200 150 100 50 0

(a)

FLoW

Free RTO OS

Line of Codes (kLoC C)

Fiasco.OC

NOVA

Binary Size (KB)

Fig. 7. Comparison based on (a) line of codes, and (b) binary Size

L of Codes then produced the FreeRTOS which had less Line smallest binary size among othher environment and microkernel [15]. The relation between Linne of Codes and binary size in Fiasco.OC and NOVA were also equal [6]. While smallest Linux Tiny [16] could reach a megabytes of binary size. This number still far from those comparison. By keeping the implementation minimal like those two microkernel, FLoW was success to produce small biinary size.

(b) Fig. 6. Context switch + context lookup time using our (aa) static run queue, and (b) dynamic allocated run queue algorithm in FLoW

Static array based run queue could be b used with the provision of significant reduced available thhread priority. The priority then reduced to 5 MAX_PRIORITY number. With this method the minimum total time of context switch s and context lookup was about 3.95 microseconds, while the maximum was 8.25 microseconds. The detail was presentedd in table 1. Thread priority distribution could optimized system performance. p

C. Memory Footprint FLoW memory usage waas evaluated by creating a several thread and then was comparedd to existing FreeRTOS memory usage of multithreading [15]. Figure 9 shows the relation between number if implementeed threads and memory footprint.

Table 1. Scheduling performance in each threead priority Detail step Context Save Context Lookup Context Load Total

Minimum Time (us) 0.02 4.3 0.03 4.35

Memory Usage in Multithreading (bytes)

bytes

Min nimum Time (us) 0.02 8.2 0.03 8.25

1000 800 600 400 200

Linux implementation in dual 2.0 GHz Inntel Pentium Xeon x86 architecture cost 3 microseconds of conttext switching time [13]. uCLinux and FreeRTOS implementation in ARM based platform cost about about 10 microseconds [14]. Compared to those operating system and embedded environment, our implementation was relatively fast.

0

0

2

3 FLoW W

4

5

6

7

8

9

10

FreeRTOS

Fig. 8. FLoW memory usage in im mpact of multithreading compared to FreeR RTOS

General purpose operating system usuallyy implemented idle task for efficient resource usage. In FLoW, it i was an option to

978-1-4799-8975-1/15/$31.00 ©2015 IEEE

1

40


[8]

In component based system, each component had their own memory footprint and affected memory usage based on how many threads were implemented. FLoW thread implementation consumed about 64 bytes of memory, which consisted of context storage, stack, and scheduling context. This number was similar to thread implementation in FreeRTOS. However, FreeRTOS needed 200 bytes for scheduler bootstrap, while FLoW needed less than those numbers. Therefore memory usage of FLoW was competitively lower.

[9] [10] [11] [12]

D. Interrupt Latency By replacing regular interrupt with semaphore, the whole system and user programming paradigm would be affected. In consequences of useful synchronization method provided by FLoW through interrupt semaphore, there was interrupt latency overhead which was about 0.45 microsecond. Interrupt and scheduler share same context save routines, therefore those number was near to context save or context load time.

[13]

[14]

[15]

Although the overhead was relatively small, interrupt handling performed by microkernel based environment need more concern on careful implementation. We put some notes on this thing as future works.

[16]

VI. CONCULSIONS This paper presented microkernel as core implementation in FLoW environment with focus on multithreading support. Experiments held in ARM based embedded platform show that FLoW was able to provide multithreading services with low side effect in memory usage. Resulted binary size was relatively small which was potentially able to accommodate middle low embedded platform. Memory allocation and scheduling algorithm could be optimize to achieve better performance and reduce interrupt latency overhead. We also add development of native thread communication function in future works.

REFERENCES [1] [2]

[3] [4]

[5] [6]

[7]

Hoover, G., Brewer, F. and Sheerwood, T. A Case Study of MultiThreading in the Embeddd Space. ACM CASES '06. 2006 Blog.vdcresearch.com. VDC Research: Embedded Microprocessor, Board & Systems Market Blog: Market Forecast. [online] Available at: http://blog.vdcresearch.com/embedded_hw/food-and-drink/ [Accessed 1 Apr. 2015]. Liedtke, J. On u-Kernel Construction. 15th ACM Symposium on Operating System Principles. 1995 Elphinstone, K. and Heiser, G. From L3 to seL4 What Have we Learnt in 20 Years of L4 Microkernels?. ACM Symposium on Operating Systems Principles, pp.133-150. 2013 Shapiro, J., Smith, J. and Faber, D. EROS: a fast capability system. 17th ACM Symposium on Operating System Principles. 1999 Steinberg, U. and Kauer, B. NOVA: A Microhypervisor-Based Secure Virtualization Architecture. Proceedings of the 5th European Conference on Computer System. 2010 Hildebrand, D. An Architectural Overview of QNX. Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures: 113–126. 1992

978-1-4799-8975-1/15/$31.00 ©2015 IEEE

41

Freertos.org, (n.d.). FreeRTOS - Market leading RTOS (Real Time Operating System) for embedded systems with Internet of Things extensions. [online] Available at: http://freertos.org [Accessed 2 Apr. 2015]. Lietdke, J. Improving IPC by Kernel Design. 14th ACM Symposium on Operating System Principles (SOSP). 1993 ARM Official, (n.d.). The Architecture for Digital World. [online] Available at: http://www.arm.com [Accessed 17 Apr. 2015]. Intel, (n.d.). Intel Embedded Processors and Chipsets. [online] Available at: http://www.intel.com [Accessed 17 Apr. 2015]. Jose, J., Sujisha, O., Gilesh, M. and Bindima, T. On The Fairness of Linux O(1) Scheduler. 5th International Conference on Intelligent System, Modelling and Simulation. 2014 Li C., Ding C., and Shen K. Quantifying The Cost of Context Switch. Proceedings of the 2007 workshop on Experimental computer science (ExpCS ’07). Article No. 2. 2007 Choi H., and Yun H. Context Switching and IPC Performance Comparison between uCLinux and Linux on the ARM9 based Prcessor. Technical report, Samsung Electronics. 2005 FreeRTOS, (n.d.). Memory Usage, Boot Times & Context Switch Times. [online] Available at: http://www.freertos.org/FAQMem.html [Accessed 14 Apr. 2015]. eLinux.org. Linux Tiny Wiki. Available at: http://eLinux.org/Linux_Tiny [Accessed 14 Apr. 2015].

FLoW: Achieve Native Multi Threading Support for Embedded System

FLoW: Achieve Native Multi Threading Support for Embedded System

Suggest Documents

Android – Multi-Threading - JFOD

Android Multi-Threading

WEARABLE MULTI-MODAL SENSOR SYSTEM FOR EMBEDDED

Hierarchical Multi-Threading For Exploiting Parallelism at ...

Smuggling Multi-Cloud Support into Cloud-native

TPM threading system - DOCUMENTOP.COM

Composable Multi-Threading for Python Libraries

Power Efficiency Study of Multi-threading Applications for Multi ...

Multi-Agent and Embedded System Technologies ...

The Imperion Threading System - CiteSeerX

Towards a Flow Analysis for Embedded System C Programs - CiteSeerX

A Survey on Operating System Support for Embedded Systems ...

Wireless multi-sensor embedded system for Agro ... - ThinkMind

Handling Constraints in Multi-Objective GA for Embedded System ...

Wireless multi-sensor embedded system for Agro ...

Wearable multi-modal sensor system for embedded audio-haptic ...

SSC - Concurrency and Multi-threading Basic concepts

Commercial Banks Use of Decision Support System to Achieve ... - irmbr

Commercial Banks Use of Decision Support System to Achieve ... - irmbr

Commercial Banks Use of Decision Support System to Achieve ... - irmbr

Effective use of Multi-Core Architecture through Multi-Threading ...

MULTI-THREADING AND SHARED-MEMORY POOL TECHNIQUES FOR ...

Effective use of Multi-Core Architecture through Multi-Threading ...

A Case Study of Multi-Threading in the Embedded Space - UCSB ...