Instructions on how to use Enea's PPT template - SICS

8 downloads 85 Views 1MB Size Report
Sep 23, 2013 ... Multicore, Multiple processing entities, Parallelism on different ..... 64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB.
System-level IPC on Multi-core Platforms SICS Multicore Day – 2013-09-23

Ola Dahl CTO Office Enea Enea Confidential – Under Copyright © 2013 EneaNDA AB

Before we start • Enea

~400 employees 468 MSEK revenue Products and Services

Services FOUNDED

Middleware OSE

Linux

1968

Now LTH

• Myself

Enea Confidential – Under Copyright © 2013 EneaNDA AB

STLiU Ericsson

System-level IPC

Message-passing between processes – intra-node and inter-node Monitoring and event handling – fault-tolerance

OSE operating system – kernel services, file system services, IP communication, program management, run-time loader, LINX Number of communicating entities ~ tens of thousands (pid space extension from 16 to 20 bits) – number of nodes ~ 100s Enea Confidential – Under Copyright © 2013 EneaNDA AB

System-level IPC Element Messaging Framework – Name server, message dispatch, communication patterns, HA functionality, Linux

C

C

C

C

A A

#nodes ~ 100(s) #threads/node ~ 1000s

B

D

B

D

D

A D B

D

D

C

Elastic Multi-Node Fixed Multi-Node A D

B C

SoC Platform Cloud

Enea Confidential – Under Copyright © 2013 EneaNDA AB

IPC

Operating System

Operating System

Communicating entities - Linux process, Linux thread, RTOS task, Bare-metal executive, User-space thread, Other executing entity (e.g. in an event-driven execution model)

Enea Confidential – Under Copyright © 2013 EneaNDA AB

IPC and Multicore

Operating System C0

C1

C2

C3

Operating System C0

C4

Bus, Interconnect, Cache, Controllers, I/O

C1

D0

D1

D2

Bus, Interconnect, Cache, Controllers, I/O

Multicore, Multiple processing entities, Parallelism on different levels – inside one SoC block, inside SoC, between SoC Communication on different levels – interconnect, caches, memory, hardware buffers and hardware IPC support Enea Confidential – Under Copyright © 2013 EneaNDA AB

IPC and Multicore Realtime

Operating System C0

C1

C2

C3

Non-Realtime

Operating System C0

C4

Bus, Interconnect, Cache, Controllers, I/O

C1

D0

D1

D2

Bus, Interconnect, Cache, Controllers, I/O

Multicore, Multiple processing entities, Parallelism on different levels – inside one SoC block, inside SoC, between SoC Communication on different levels – interconnect, caches, memory, hardware buffers and hardware IPC support Real-time – core isolation – dedicated cores for real-time response Enea Confidential – Under Copyright © 2013 EneaNDA AB

Heterogeneous Hardware TCI6638K2K - Multicore DSP+ARM KeyStone II System-on-Chip http://www.ti.com/product/tci6638k2k

Processing – 8 C66x DSP Cores (up to 1.2 GHz), 4 ARM Cores (up to 1.4 GHz), Wireless comm (3GPP) coprocessors Interconnect and control - Multicore Navigator, TeraNet, Multicore Shared Memory Controller, HyperLink Enea Confidential – Under Copyright © 2013 EneaNDA AB

Heterogeneous Software Core isolation for real-time response

Realtime

Non-Realtime

Real-time domain and non-real-time domain Run-time categories in real-time domain • Native threads • User-space threads • RTOS migration • Other execution frameworks, e.g. Open Event Machine • ENEA LWRT

Operating System C0

C1

D0

D1

D2

Bus, Interconnect, Cache, Controllers, I/O

Enea Confidential – Under Copyright © 2013 EneaNDA AB

System-level IPC and Multicore Communicating entities – e.g. processes, threads, user-space threads, bare-metal executives Levels of parallelism • Multicore processor in a SoC • Multiple blocks in a SoC • Multiple SoC in a node • Multiple nodes Communication on different levels (e.g. intra-node and internode) • On each level – Establish contact, Perform communication, Monitor and act on events, Close Enea Confidential – Under Copyright © 2013 EneaNDA AB

Where are we heading?

Linux Hardware

Virtualisation Enea Confidential – Under Copyright © 2013 EneaNDA AB

Linux EE Times report - http://seminar2.techonline.com/~additionalresources/embedded_mar1913/embedded_mar1913.pdf

Linux usage 2013 – 50% 2012 – 46%

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Linux Status of embedded Linux – March 2013 http://elinux.org/images/c/cf/Status-of-Embedded-Linux-2013-03-JJ44.pdf

• • • •

Average time between Linux releases – 3.3 – 3.8 – 70 days Linux 3.4 – RPMsg for IPC between Linux and e.g. RTOS Linux 3.7 – ARM multi-platform support, ARM 64-bit support Linux 3.7 – perf trace (alternative to strace)

Status of Linux – September 2013 • Latest stable kernel – 3.11.1 • Example changes in 3.11 (released September 2, 2013): – ARM huge page support, KVM and XEN support for ARM64 – SYSV IPC message queue scalability improvements

• Example changes in 3.10 (released June 30, 2013): – Timerless multitasking Enea Confidential – Under Copyright © 2013 EneaNDA AB

Linux and real-time Real-time framework e.g. Xenomai - http://www.xenomai.org/ PREEMPT_RT - https://rt.wiki.kernel.org/index.php/Main_Page Core isolation and tickless operation – striving for ”Bare-Metal Multicore Performance in a General-Purpose Operating System” http://www2.rdrop.com/~paulmck/scalability/paper/BareMetalMW.2013.02.25a. pdf Timerless multitasking in 3.10 retains 1 Hz tick also on isolated cores Linux 3.12-rc1 (2013-09-16) - even more tickless kernel (1 Hz maintenance tick removed) – still work to be done, e.g. with memory management

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Hardware ITRS - http://public.itrs.net - fifteen-year assessment of the semiconductor industry’s future technology requirements ITRS 2012 UPDATE - http://public.itrs.net/Links/2012ITRS/Home2012.htm • System Drivers - SOC Networking Driver, SOC Consumer Driver, Microprocessor (MPU) driver, Mixed-Signal Driver, Embedded Memory Driver • SOC networking driver - moving towards “multicore architectures with heterogeneous on-demand accelerator engines”, with “integration of onboard switch fabric and L3 caches”

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Hardware SOC networking driver – MC/AE Architecture – from http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Hardware SOC networking driver – System performance and # of cores – from http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf

Assumptions - constant cost (die area), per-year increase of number of cores (1.4 x), core frequency (1.05 x), accelerator engine frequency (1.05 x) - logic, memory, cache hierarchy, switching-fabric and system interconnect will scale consistently with the number of cores System performance – the “product of number of cores, core frequency, and accelerator engine frequency” Enea Confidential – Under Copyright © 2013 EneaNDA AB

Virtualization NFV – Network Function Virtualization ETSI - http://portal.etsi.org/NFV/NFV_White_Paper.pdf “leveraging standard IT virtualisation technology to consolidate many network equipment types onto industry standard high volume servers, switches and storage, which could be located in Datacentres, Network Nodes and in the end user premises” Virtualization using e.g. KVM or XEN

Enea Confidential – Under Copyright © 2013 EneaNDA AB

System-level IPC aspects Establishing and performing efficient communication Constraints from • Real-time • Hardware with an increasing interest in virtualization

Enea Confidential – Under Copyright © 2013 EneaNDA AB

IPC and Linux

Is there any remaining work to do?

Enea Confidential – Under Copyright © 2013 EneaNDA AB

IPC in Linux (and UNIX)

POSIX named semaphore Linux 2.6

mmap SVR4 pipe

POSIX rt

UNIX SysV

FOUNDED

CMA Linux 3.2

eventfd Linux 2.6.22 Now

1964 ’70 Enea

’90

’80 Emacs

flock 4.2BSD Linux 1.0

’10 ’00 POSIX shmem Linux 2.4 POSIX mq Linux 2.6.6

Overview, book, man pages, etc. by Michael Kerrisk - http://man7.org/ Enea Confidential – Under Copyright © 2013 EneaNDA AB

IPC on Linux nanomsg OpenMPI TIPC

kdbus

AF_BUS

Binder

DBUS

FOUNDED

RPMsg

0MQ

Now

2000 ’2

’4

’6

’8

LINX for Linux Enea Element

Enea Confidential – Under Copyright © 2013 EneaNDA AB

’10

Work in progress sysv ipc shared mem optimizations, June 18, 2013 http://lwn.net/Articles/555469/ “With these patches applied, a custom shm microbenchmark stressing shmctl doing IPC_STAT with 4 threads a million times, reduces the execution time by 50%” ALS: Linux interprocess communication and kdbus, May 30, 2013 http://lwn.net/Articles/551969/ “The work on kdbus is progressing well and Kroah-Hartman expressed optimism that it would be merged before the end of the year. Beyond just providing a faster D-Bus (which could be accomplished without moving it into the kernel, he said), it is his hope that kdbus can eventually replace Android's binder IPC mechanism. “ Enea Confidential – Under Copyright © 2013 EneaNDA AB

Work in progress Speeding up D-Bus, February 29, 2012 http://lwn.net/Articles/484203/ “D-Bus currently relies on a daemon process to authenticate processes and deliver messages that it receives over Unix sockets. Part of the performance problem is caused by the user-space daemon, which means that messages need two trips through the kernel on their way to the destination”

Fast interprocess communication revisited, November 9, 2011 https://lwn.net/Articles/466304/ “Rather we start with the observation that this many attempts to solve essentially the same problem suggests that something is lacking in Linux. There is, in other words, a real need for fast IPC that Linux doesn't address” Enea Confidential – Under Copyright © 2013 EneaNDA AB

Work in progress Fast interprocess messaging, September 15, 2010 http://lwn.net/Articles/405346/ “Rather than copy messages through a shared segment, they would rather deliver messages directly into another process's address space. To this end, Christopher Yeoh has posted a patch implementing what he calls cross memory attach.”

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Which IPC to use?

Functionality

Performance

Cost

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Technology constraints

Choosing an IPC - Functionality Functionality

SysV Shared memory

POSIX Shared memory

FIFO

Stream Socket

0MQ

LINX

End-point addressing

SysV key

Shmem object name

File system node

AF_UNIX – file system node, AF_INET – IP adress and port

Transport and address (Transport = TCP, ipc, inproc)

Endpoint name specifying path to peer

End-point repr.

Variable

File desc

File desc x 2

Socket descriptor

0MQ socket

LINX endpoint, spid

Channels

A memory area

A memory area

The FIFO (unidirectional)

The socket (bidirectional)

0MQ socket internal (bidirectional) – e.g. TCP or UNIX domain socket

Buffer associated with LINX endpoint

Initialisation

shmget, shmat

shm_open, mmap

mkfifo, open

socket, bind, listen, accept, connect

Create 0MQ context and 0MQ socket

linx_open, linx_hunt

Closing

shmdt

munmap, shm_unlink

close, unlink

close

Close 0MQ socket

linx_close

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Choosing an IPC - Functionality Functionality

SysV Shared memory

POSIX Shared memory

FIFO

Stream Socket

0MQ

LINX

Sending

write to memory, no synchronizati on

write to memory, no synchronizat ion

write

write

Send message or number of bytes to 0MQ socket

Send LINX signal

Receiving

Read from memory, no synchronizati on

Read from memory, no synchronizat ion

read

read

Receive message or number of bytes from 0MQ socket

Receive LINX signal

Blocking

No (unless implemented separately)

No (unless implemented separately)

Blocking and nonblocking R/W

Blocking and non-blocking R/W

Blocking and non-blocking R/W

Receive is blocking (nonblocking possible), Send is not

Monitoring

No (unless implemented separately)

No (unless implemented separately)

select, poll

select, poll

Monitoring callback can be registered with 0MQ context

LINX attach

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Choosing an IPC – Technology constraints Technology

0MQ

kdbus

LINX

Sockets

Yes

No

Yes, own type

Daemons

No

No

Discovery daemon (optional)

Kernel modules

No

Yes

Yes

Pthread synchronization

Yes

No

Yes

Kernel synchronization

No

Yes

Yes

Programming languages

C and more

C

C

Development status

Latest stable release is 3.2.3, from May 2013

Estimated to be ready in 2013

Initial release 2006, current version is 2.6.5, released June 2013

License

LGPLv3

LGPL

BSD and GPLv2

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Choosing an IPC - performance • ipc-bench: A UNIX inter-process communication benchmark • University of Cambridge http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/ Measures Latency, Throughput, IPI latency • Public results dataset “Since we have found IPC performance to be a complex, multi-variate problem, and because we believe that having an open corpus of performance data will be useful to guide the development of hypervisors, kernels and programming frameworks, we provide a database of aggregated ipc-bench datasets.” Enea and ipc-bench – porting to 32-bit, porting to ARM, porting to PowerPC, adding tests for CMA, LINX, ZeroMQ Enea Confidential – Under Copyright © 2013 EneaNDA AB

Measuring IPC performance Why is this interesting? From The case for reconfigurable I/O channels, S. Smith et al, RESoLVE12, 2012 - http://anil.recoil.org/papers/2012-resolve-fable.pdf “We show dramatic differences in performance between communication mechanisms depending on locality and machine architecture, and observe that the interactions of communication primitives are often complex and sometimes counter-intuitive” “Furthermore, we show that virtualisation can cause unexpected effects due to OS ignorance of the underlying, hypervisor-level hardware setup” Enea Confidential – Under Copyright © 2013 EneaNDA AB

Measuring IPC performance Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html

Pairwise IPC latency between cores

64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB Linux 3.8.5-030805-generic, x86_64 Enea Confidential – Under Copyright © 2013 EneaNDA AB

Measuring IPC performance Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html

Pairwise IPC throughput between cores. (x-axis is packet size, y-axis is Gbps)

64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB Linux 3.8.5-030805-generic, x86_64 Enea Confidential – Under Copyright © 2013 EneaNDA AB

Measuring IPC performance Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7 180000 160000 140000 mempipe_spin_thr

120000

mempipe_thr 100000

pipe_thr tcp_thr

80000

unix_thr vmsplice_coop_pipe_thr

60000

vmsplice_pipe_thr 40000 20000 0 64

4096

65536

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Measuring IPC performance ARM Pandaboard @ 1 GHz, Cores 0 and 1 3000

2500 mempipe_spin_thr

2000

mempipe_thr pipe_thr 1500

tcp_thr unix_thr

1000

vmsplice_coop_pipe_thr vmsplice_pipe_thr

500

0 64

4096

65536

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Measuring IPC performance Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7 30000

0MQ vs UNIX sockets

25000

20000

64 15000

4096 65536

10000

5000

0 zmq_inproc_thr

zmq_ipc_thr

zmq_tcp_thr

Enea Confidential – Under Copyright © 2013 EneaNDA AB

unix_thr

Profiling and Performance Brendan Gregg - Linux Performance Analysis and Tools - SCaLE 11x 2013 http://dtrace.org/blogs/brendan/2013/06/08/linux-performance-analysis-andtools/ Apps and libs System call interface

***

VFS, File systems, Block device interface

Sockets, TCP/UDP, IP, Ethernet

Scheduler, VM

Device drivers - perf - https://perf.wiki.kernel.org/index.php/Main_Page *** - DTrace - https://github.com/dtrace4linux - SystemTap - http://sourceware.org/systemtap/ Enea Confidential – Under Copyright © 2013 EneaNDA AB

Profiling and Performance Collecting data with perf – IPC test with pipes

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Profiling and Performance Analyzing data recorded with perf

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Profiling and Performance Examining where time is spent

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Profiling and Performance A lot more to choose from*: strace, netstat, top, pidstat, mpstat, dstat, vmstat, slabtop, free, tcpdump, ip, nicstat, iostat, iotop, blktrace, ps, pmap, traceroute, ntop, ss, lsof, oprofile, gprof, kcachegrind, valgrind, google profiler, nfsiostat, cifsiostat, latencytop, powertop, LLTng, ktap, ...

* http://www.brendangregg.com/Slides/SCaLE_Linux_Performance2013.pdf Enea Confidential – Under Copyright © 2013 EneaNDA AB

Summary IPC in Linux - Stable but not finished

IPC on Linux – diversified Performance and profiling – ipc-bench (with adaptations and extensions), a large selection of profiling tools

Enea Confidential – Under Copyright © 2013 EneaNDA AB

Conclusions • A variety of IPC mechanisms exist • There is no clear one-fits-all solution • Performance aspects and functionality aspects (location transparency, robustness) – different trade-offs for different use-cases

• IPC and Linux – many stable mechanisms but still work-inprogress (e.g. kdbus) • Performance and profiling required – ipc-bench (with adaptations and extensions) – perf for performance profiling (one of several, however with a powerful feature set) Enea Confidential – Under Copyright © 2013 EneaNDA AB

Challenges • Systems requirements and design - parallelism, partitioning, heterogeneity, functional requirements, performance requirements – choosing an IPC mechanism • Programming - frameworks and execution environments – legacy and re-use – choosing a programming paradigm • Verification - measurements and profiling - are we designing (and implementing) the system as we planned? – choosing the right tools

Enea as an IPC partner - Long-term experience, Competence for building future IPC systems – development, integration, configuration, performance assessment

Enea Confidential – Under Copyright © 2013 EneaNDA AB

SICS Multicore day System-level IPC on multicore platforms Multicore System-on-Chip solutions, offering parallelization and partitioning, are increasingly used in real-time systems. As the number of cores increase, often in combination with increased heterogeneity in the form of hardware accelerated functionality, we see increased demands on effective communication, inside a multicore node but also on an inter-node system-level. The presentation will outline some of the challenges, as seen from Enea, to be expected when building future communication mechanisms, with requirements on performance and scalability, as well as transparency for applications. We will give examples from ongoing work in the Linux area, from Enea and from other open source contributors. Enea Confidential – Under Copyright © 2013 EneaNDA AB