Apr 5, 2007 - an application-specific manner ... Goal: provide app-specific policies using a COTS base in ... Case Study: Guest System Call Interposition.
Hijack: Taking Control of COTS Systems for Real-Time User-Level Services Gabriel Parmer and Richard West Computer Science Deparment Boston University Boston, MA 02215 {gabep1, richwest}@cs.bu.edu
April 5, 2007
COTS in RT/Embedded Systems
Commodity Off The Shelf (COTS) general purpose systems provide many advantages for RT/Embedded systems Tested and widely deployed code-base Established development tools/environments Developer familiarity → faster time to market/smaller development costs
Parmer, West, BU CS
Hijack
2/33
COTS in RT/Embedded Systems (2)
General purpose systems have a number of disadvantages General-purpose policies are often insufficient/awkward for needs of RT applications QoS, predictability, policies absent for satisfying app-specific requirements, i.e. EDF Semantic gap between the requirements of the application and the functionality/guarantees of the system
Parmer, West, BU CS
Hijack
3/33
Shrinking the Semantic Gap
Domain-specific OSs created with a focus on one class of applications (RTOSs) Extensible systems allow the modification of system policies in an application-specific manner Generally either not COTS, or not isolation preserving Developing extensions requires skill/experience Goal: provide app-specific policies using a COTS base in a safe and predictable manner
Parmer, West, BU CS
Hijack
4/33
Hijacking your COTS system
Efficient interposition on service requests from specific applications allows the definition at user-level of application-specific policy
Parmer, West, BU CS
Hijack
5/33
Hijack Mechanism Hijack execution environment Guest
Guest
... Background process
Executive
Schedule / dispatch Syscall interception Unintercepted syscalls
IDT
Host Kernel
Kernel module Interrupts
Hijack module receives specific events system calls page faults possibly device interrupts Vector guest service requests to executive
Hardware (I/O devices)
executive controls execution context of guests create/switch address spaces access guest registers event-triggered executive scheduler Parmer, West, BU CS
Hijack
6/33
Hijack Mechanism (2) Hijack execution environment Guest
Guest
... Background process
Executive
Schedule / dispatch Syscall interception Unintercepted syscalls
IDT
Host Kernel
Kernel module Interrupts
executive isolated at user-level executive harnesses base system functionality where appropriate
Hardware (I/O devices)
Does not require changes to the COTS system source-code (no kernel recompilation) One (2000 LOC) hijack module enables flexibility in the definition of user-level app-specific services Parmer, West, BU CS
Hijack
7/33
Case Study: Guest System Call Interposition 1 Guest
Guest
syscall
...
2
saved guest state
Executive executive state (to be restored)
Kernel module
3
4 Host Kernel
5
guest service request intercepted by Hijack module executive region mapped into current guest address space guest registers saved into executive region executive registers restored executive executed
executive not present while guest is executing – mapped in dynamically executive isolated from guests Parmer, West, BU CS
Hijack
8/33
Case Study: Guest System Call Return 1
Guest
Guest
2
...
3 saved guest state (to be restored)
saved executive state
Kernel module
Executive
4
Host Kernel
5
6
executive returns to kernel module executive registers saved in module guest registers restored from executive region executive region unmapped from guest address space executive’s mappings evicted from TLB guest executed
Can use global bits to avoid flushing guest pages from TLB set all guest pages as global Parmer, West, BU CS
Hijack
9/33
Experimental Setup
All experiments conducted on a 2.4 GHz Pentium 4 processor on Linux 2.6.13 with a clock tick every 10 milliseconds
Parmer, West, BU CS
Hijack
10/33
nanosleep Experiments
A goal of Hijack is to offer the ability to enhance default system functionality in an application-specific manner nanosleep: yield for at least a specific number of nanoseconds used in multimedia apps such as mplayer
Wake up time variability/unpredictability clock granularity COTS CPU scheduler
Parmer, West, BU CS
Hijack
11/33
nanosleep Experiments (2)
Hijack-provided extensions: 1 Hijack: Executive can give scheduler preference to tasks waking from nanosleep 2 Hijack Extended: Executive can busy wait for periods less than a clock tick
Parmer, West, BU CS
Hijack
12/33
nanosleep Experiments (3) 100000
Jitter (Tens of Microseconds)
Hijack Linux Task Hijack Extended 10000
1000
100
10
1
0
1
2
3
4
Number of Background CPU Bound Tasks Parmer, West, BU CS
Hijack
13/33
QoS for Packet Stream Delivery Scheduling of Tasks dependent on I/O availability with QoS constraints: models traffic shapers, QoS aware stream processing, etc. . . Four streams of 42,000 16 byte packets/second from separate hosts over GigE Single host with four tasks, each receiving a stream QoS constraints: Task 0: 35,000 p/s
higher QoS
Task 1: 20,000 p/s Task 2: 10,000 p/s
lower QoS
Task 3: best effort
Start tasks every 5 seconds from Task 3 to Task 0 Parmer, West, BU CS
Hijack
14/33
QoS for Packet Stream Delivery (2) Three scenarios: 1 Linux, tasks with same priority 2 Linux, tasks with different priority 3 Hijack, Executive using policy similar to proportional-share Tasks assigned tokens proportional to QoS select used to probe for I/O activity Task with tokens and available I/O executed Tokens refreshed every given period When guest make system call to read data read data into guest buffer until no tokens, or no data
Parmer, West, BU CS
Hijack
15/33
Number of packets delivered to a task
Packet Delivery QoS Results: Linux Same Priority 45000 40000 35000 30000 25000 20000 15000 Task 0 Task 1 Task 2 Task 3
10000 5000 0
0
5
10
15
20
25
30
Time (seconds) Parmer, West, BU CS
Hijack
16/33
Number of packets delivered to a task
Packet Delivery QoS Results: Linux Increasing Priority 45000 40000 35000 30000 25000 20000 15000 Task 0 Task 1 Task 2 Task 3
10000 5000 0
0
5
10
15
20
25
30
Time (seconds) Parmer, West, BU CS
Hijack
17/33
Number of packets delivered to a task
Packet Delivery QoS Results: Hijacked Linux 45000 40000 35000 30000 25000 20000 15000 Task 0 Task 1 Task 2 Task 3
10000 5000 0
0
5
10
15
20
25
30
Time (seconds) Parmer, West, BU CS
Hijack
18/33
Related Work Related work includes: RTLinux Separate system into two functional domains for Hard-RT predictability Focus is on interrupt latency, not app-specific resource management policies VMs Interface provided to guest OSs (executives) is identical to the hardware itself Focus is on HW virtualization, not on providing app-specific services
Parmer, West, BU CS
Hijack
19/33
Conclusions
Hijack enables app-specific, user-level RT policies using a general purpose computing base Use interposition on system service requests to redefine policies executive defined at user-level can leverage underlying system functionality where appropriate Demonstrated that complex policies can be introduced A useful approach towards shrinking the semantic gap
Parmer, West, BU CS
Hijack
20/33
Limitations
global bit trick not ideal for all workloads can revert to simply flushing whole TLB or use other techniques
Certain aspects of the system that cannot be hijacked using these techniques If utilize functionality in base system, generally cannot Hijack that functionality COTS system interrupt handling behavior (prototype limitation)
Parmer, West, BU CS
Hijack
22/33
Using Global-bit Trick to Avoid TLB Flushes Study the effect of TLB flushes on Executive ↔ Guest communication
Parmer, West, BU CS
350
Hijack Guest -> Executive RPC Linux Pipe System Call
300 250 # iTLB Misses
Vary working set size (WSS) of guest by touching data/instruction pages then making system call instruction-TLB has 128 entries data-TLB has 64 entries Global-bit trick avoids TLB flush, thus avoiding misses
200 150 100 50 0
Hijack
0
50
100
150 200 Instruction WSS
250
300
23/33
Using the Global-bit Trick to Avoid TLB Flushes (2) 35000
30000
Hijack Guest -> Executive RPC Linux Pipe
30000
20000
20000
Cycles
Cycles
25000
15000
15000 10000
10000
5000
5000 0
Hijack Guest -> Executive RPC Linux Pipe
25000
0
Parmer, West, BU CS
50
100
150 Data WSS
200
250
300
0
Hijack
0
50
100
150 200 Instruction WSS
250
300
24/33
Timer interrupts in Executive synthesized with signals Predictable notification Executive can define customizable policy for scheduling beyond what is present in the COTS system (EDF, PFAIR, DWCS, etc. . . )
Parmer, West, BU CS
Average Signal Interarrival Time (milliseconds)
Asynchronous Event Notification Experiments
30.0
Hijack Linux Task
25.0
20.0
15.0
10.0
5.0
0.0
0
1
2
3
4
Number of Background CPU Bound Tasks
Hijack
25/33
Hijack Execution Environment Address Space
sigaltstack
read-writable
4KB guard page executive stack executive
4KB guard page signal_handler
Parmer, West, BU CS
0x3FC00000 read-only
Hijack
26/33
QoS Expts. Executive Algorithm main_event_loop () { next = NULL; select on the file descriptors for each task; if (timing period has expired) for (each task in tasks) curr_tokens(task) = init_tokens(task); for (each task in tasks) if (select indicated that task has data && curr_tokens(task) > 0) { next = task; break; } if (next == NULL) next = best_effort_task; execute next; } Parmer, West, BU CS
Hijack
27/33
QoS Expts. Executive Algorithm (2)
guest_syscall_read(guest_fd, guest_buf, guest_size) { fd = translate_to_host_fd(guest_fd); loop until (read doesn’t return data || curr_tokens(task) == 0) { read(fd, guest_buf, guest_size); //nonblocking curr_tokens(task)--; } }
Parmer, West, BU CS
Hijack
28/33
Max. Jitter QoS Results: Linux Same Priority
Maximum stream jitter (cycles)
1e+09
1e+08
1e+07
1e+06
100000
Task 0 Task 1 Task 2 Task 3 0
5
10
15
20
25
30
Time (seconds) Parmer, West, BU CS
Hijack
29/33
Max. Jitter QoS Results: Linux Increasing Priority
Maximum stream jitter (cycles)
1e+09
1e+08
1e+07
1e+06
100000
Task 0 Task 1 Task 2 Task 3 0
5
10
15
20
25
30
Time (seconds) Parmer, West, BU CS
Hijack
30/33
Max. Jitter QoS Results: Hijacked Linux
Maximum stream jitter (cycles)
1e+09
1e+08
1e+07
1e+06
100000
Task 0 Task 1 Task 2 Task 3 0
5
10
15
20
25
30
Time (seconds) Parmer, West, BU CS
Hijack
31/33