High-level Portable Programming Language for Optimized Memory
Recommend Documents
Widespread adaptation of shared memory programming for High Performance Computing has been inhibited by a lack of standardization and the resulting ...
integration of such thread services into the programming model. This design .... eral SW-DSM systems, as well as a custom SPMD-style. 4These results are ...
Switches are tools used to split and join pipelines into concurrent streams of data [13] .... component reuse. ...... Finally in line 6, dJoin maps the stream that it.
purpose programming language (such as C or C++) to create a complete implementation by ... In particular, most simulators do not provide good support for algorithms ... (connections of a node, nodes of a node group, replicates of a network, ...
Feb 15, 2015 - When programming a packet-processing program for NPs, ..... Compiler) or Bison parser-generators but is written in and generates Perl code.
Center on Large Grain Data Flow [Bab84], as well as ... we call selfscheduled, .... call statement to the startup routine of every Force subroutine in the program.
Programming Language. • Course Outline. 1. ... Object-Oriented Programming. 9.
Concurrency. 10. ... Robert W. Sebesta, Concepts of Programming. Languages ...
Jan 30, 2014 ... explain the overall structure of a C language program ... Deitel & Deitel, "C: How
to Program", Prentice Hall, 1998. – Kelly & Pohl, "A Book on C: ...
the construction of such proofs easier by embedding the proof system into a compiler. Using the ...... Coq'Art: The Calculus of Inductive Constructions. Texts in ...
Dec 4, 1996 - Another example is the selection of commensurable periods of control loops ... language and use a simple example application known from the .... In the following we will treat the constant true as a boolean expression, so that ..... the
The C programming language has been around for a long time and is still used a lot nowadays. The core of the language is
components and an energy source to save the data, giving them a lower per-DIMM capacity and higher cost per ... mapped i
may also be combined with various programming order techniques that mitigate the ..... http://newsroom.intel.com/community/intel_newsroom/blog/2011/04/14/.
Toni Morrison's The Bluest Eye: Lost and Found in Translation. 33. Sangita
Rayamajhi. Teaching Toni Morrison to a Culturally. Diverse Class: Experiences
from ...
GPUs from the company, and similar efforts are underway to support ... We can develop such an API because, even though there are many competing multicore ...
posed memory models (DRF0, DRF1, PL) that allow the programmer to reason with ... University authors are supported by DARPA contract N00039-91-C-0138.
School of Engineering and Applied Sciences ... tion is the Memory-Bounded Dynamic Programming ..... trade-off between computational complexity and solu-.
Learning Optimal Bayesian Networks. Brandon ... dynamic programming algorithm for learning the op- ... O(C(n, n/2)), where C(n, n/2) is the binomial coeffi-.
The only virtual processor deï¬ned from the beginning of program execu- tion till program termination is the pre-deï¬ned virtual host- processor of the scalar type.
framework for a runtime, user-level library, MMlib, in which DRAM is treated as a dynamic size cache ..... This policy is not at odds with the user ..... ratios; these ratios are dependent on the ambient temperature in the simulation. The total energ
OpenCL for programming shared memory multicore CPUs. Akhtar Ali, Usman
Dastgeer, and Christoph Kessler. PELAB, Dept. of Computer and Information ...
Nov 12, 2006 - [8] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, ... [9] D. Callahan, B. L. Chamberlain, and H. P. Zima. ... K. Warren.
Oct 28, 2016 - Apache Gi- raph [1] is such an open-source Java implementation with contributions from Yahoo! and Facebook, that operates on top of HDFS.
communication it is only a small constant factor away from optimality. Unless traditional PRAM ... log. ( n) ) , that is a function in the size of the data, and not O. ( log ... After coming back from recursion, the parent pointers and values of the
High-level Portable Programming Language for Optimized Memory
passed to process2. 015 void process2(Packet i) {. // Port 2 to 1 (no VLAN -> no VLAN). 016. Packet o = i.subpacket(14); // remove MAC header (no VLAN). 017.
High-level Portable Programming Language for Optimized Memory Use of Network Processors Yasusi Kanada Hitachi, Ltd., Central Research Laboratory
1. Introduction ► Network-processor (NP) programming is important for high-performance networking and in-network processing. • NPs enable quick development of arbitrary protocols and functions.
► Problems of NP programming: • No portability because of hardware/vendor-dependent proprietary tools are required. • Difficulty in development because of required specialized skill and knowledge (which are not widely available).
► Focus: Programmers must distinguish SRAM and DRAM to achieve 10-100 Gbps wire-rate packet processing. • Whole packet must be stored in DRAM. • Operations on data in DRAM may disable wire-rate processing. • NPs might not have cache, or cache mishit may cause serious problems in NPs.
► Means for solving these problems: • Phonepl: an open, high-level, and portable programming language. • Four packet-data representations, which programmers can be mostly unaware of.
2. Phonepl language design ► Syntax and semantics are close to Java. • Java/C++ programmers can use Phonepl easily.
► Packets are immutable byte-strings. • Basic operations on packets are substring, concatenation, etc. • Non-IP protocols can be programmed easily. • Immutability enables data sharing among (input/output) packets.
► Byte strings and packets are different types of objects. • They are handled by quire different methods.
► Assumptions on packet and string data • Whole packet is stored only in DRAM, but the header is in SRAM (cached). • Whole byte string is cached (in SRAM and maybe stored in DRAM).
► Packet and byte string operations • substring and subpacket operations
• subpacket generates a packet from part of a packet. • substring generates a byte string from part of a packet or string.
• Concatenation operations
• concat generates a byte string from byte strings. • “new Packet” generates a packet from byte strings and a packet. • No methods for concatenating two or more packets.
3. Phonepl program example ► Simple MAC header addition/removal
MAC header addition (process1) MAC header removal (process2)
NetStream2 (10 Gbps)
import NetStream1; Bidirectional packet stream processing import NetStream2; class AddRemMAC { NetStream out1; Input to port1 is Two packet I/O streams NetStream out2; passed to process1 public AddRemMAC(NetStream port1 > process1, NetStream port2 > process2) { out1 = port1; Input to port2 is out2 = port2; passed to process2 } void process1(Packet i) { // Port 1 to 2 (no VLAN -> no VLAN) Packet o = new Packet(i.substring(0,14),i); // MAC header of original packet out2.put(o); } void process2(Packet i) { // Port 2 to 1 (no VLAN -> no VLAN) Packet o = i.subpacket(14); // remove MAC header (no VLAN) out1.put(o); } Object that process packets (singleton) is created. void main() { new AddRemMAC(new NetStream1(), new NetStream2()); } }
4. Implementation method ► A Phonepl implementation selects four packet representations statically or dynamically. ► Four representations of packet type • Cached: whole packet is in SRAM. • No copy is assumed to be in DRAM.
• Mixed: whole packet is in DRAM and packet header is in SRAM. Outline of packet data structure: • Header size is variable.
Four representations
• Gathered: packet consists (1) Cached packet Cached size of multiple fragments. • Represented by an array or list of fragments.
• Uncached: whole packet is in DRAM • No copy is assumed to be in SRAM.
5. Prototyping and Evaluation ► Prototype • Phonepl was implemented for Cavium Octeon® NP.
► Evaluation using two Phonepl programs • Prototype was evaluated by using a program for MAC header addition/deletion and a pass-through program. • An Octeon board with these programs was connected to each node in VNode Infrastructure (a network-virtualization infrastructure). • NP Board: GE WANic 56512
10
• The throughput was over 7.5 Gbps (close to 10-Gbps wire-rate).
Throughput (Gbps)
► Evaluation result
8
6
4
Addition Deletion Pass-through (f orward)
2
Pass-through (backward) 0 0
500
1000
Packet size (Byte)
1500
6. Conclusion ► To make NP programming easier, Phonepl language and a method for implementing Phonepl are proposed. • Programmers can use SRAM and DRAM appropriately without distinguishing them. • Four representations of packet type and usage of them were proposed.
► The evaluation result shows Phonepl for Octeon NP enables high throughput (close to 10-Gbps wire-rate) in simple packet processing.
Appendix: Detailed packet data structures (1) Cached packet 0
SRAM (Descriptor)
size Cached
offset (Cached data)
3 size’ Cached
offset’
offset
(2) Mixed packet 0
SRAM
size Mixed
offset 3 size’ Mixed
offset2
(Descriptor)
size0
(Cached data)
offset’
DRAM
offset = size0 − size (Stored data)
offset = size0 − size
SRAM
(3) Gathered packet
DRAM
(Cached data) 0
size Gather
offset 6 size’ Gather
offset’
size1
(Stored data 2) offset’ = 6
size2 size3
offsets
(Stored data 1)
(4) Uncached packet 0
size Uncached
DRAM
offset 8
size’ Uncached
offset’
(Stored data)
Appendix: Detailed usage of four representations ► Mixed representation is required. • In packet processing, only packet headers are usually processed. • In such cases, it is better to store only packet headers to SRAM and tails to DRAM.
► Gathered representation is required. • This representation is useful when generating a packet by concatenating multiple data in DRAM or SRAM. • If trying to collect whole data into contiguous area in DRAM, DRAM to DRAM copy, which requires much time, is required.
► State transition between four representations subpacket Mixed subpacket