Performance Prediction of Throughput-Centric Pipelined Global ... - SLIP
Recommend Documents
T. N. Mudge, R. B. Brown, W. P. Birmingham, J. A.. Dykstra, A. I. Kayssi, R. J. ... âcheckT: and mint: : Timing veriï¬cation and opti- mal clocking of synchronous ...
2. References. John L. Hennessy and David A. Patterson,. Computer
Architecture a Quantitative. Approach, second edition, Morgan. Kaufmann.
Chapter 3 ...
Such a scheme will not be a viable solution to serve the needs of upcoming ... and (2) by leveraging RDMA to transfer checkpoint data. However it hasn't totally ...
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 5, Issue 8, August .... Starvation Free: The algorithm should not allow a.
application developer and make it easy to tailor his/her specific problem to a ..... On the practical side, we envision a computer center where applications, or jobs, ...
[10] Jump R. and Ahura S. "Effective Pipeline of Digital. Systems", IEEE Trans. on Computers, Vol. C-27, Nº9, pp.855-865, Sept. 1978. [11] Hatamian M. and G.
a 1-group time independent discrete ordinates (SN ) three- dimensional Cartesian geometry ... ordinates (SN ) three-dimensional Cartesian geometry neu- tron transport problem. ..... receive times and ping-pong times respectively. Parameter.
Feb 1, 2017 - MODEST RECOVERY EXPECTED IN 2017 ... Trends Monitor are based primarily on quarterly FDI data derived from the (extended) directional.
based on meaningful feedback signals may provide a solution to this problem that ... object stabilization failure, has the advantage of allowing a grip stability problem to ..... applying the necessary corrections to the predefined velocity trajector
statistical mechanics, and molecular dynamics simulation techniques relavant to this thesis. ..... A.2 Parameters for the angular contribution to the carbon bond order. . 124 xxii .... lytical solutions when we assume the no-slip boundary condition.
Oct 28, 2010 - rotating disc contactor. The pulsed columns have a clear advantage over other mechanical extractors when processing corrosive or radioactive ...
Inconel-718 using Artificial Neural Network ... Experiments were designed and conducted according to Taguchi's L18 orthogonal array. ... contributing factors and the role of process parameters in the responses ... input variables and their interactin
Yale University School of Medicine and. Yale-New Haven Hospital. Jodi D. Nadler. Cornell University Medical Center. Paul F. Malloy. Butler Hospital and Brown ...
Therefore, determination of which software test phases to focus improvement ..... The variables of interest in this study are divided into four sets (Table 1), i.e., fault- inflow, status ...... SIGKDD Explorations Newsletter, 11(1), 10â18. Harman,
stick-slip procedure can result increasingly unconservative for soft soils and deep ... CP1020, 2008 Seismic Engineering Conference Commemorating the 1908 ...
scenario is the performance of the controller in a vehicle model without any suspension dynamics, the second scenario incorporates linear passive vehicle suspension system (LPVSS) and the ..... K. Dietsche and M. Klingebiel, Automotive Handbook' 7th
Jun 22, 2004 - If it is assumed that the boat maximum velocity is correctly predicted by the ... In this work, it was considered that for calm water, the boat velocity remains almost constant so ... problem describing the motion of the boat when the
overview of techniques utilized to predict application execution time. The whole ... processors, the idle time (Tidle) is when processor stay idle. The last two times ...
Tackling climate change, increasing energy security, engaging North Korea and moving forward Northeast Asian integration â âGreen Growthâ in Korea and the ...
delay variation sets the system's clock cycle time [4]. Linear feedback shift registers (LFSR) in general are constructed with D-type flip-flops in the forward.
Aug 31, 2011 - tures only provide statistical guarantees as there are deterministic packet ..... low complexity optimization procedures in finding the worst case bound ...... Router Search Engine Architecture with Single Port Memories,â. Proc.
synchronous pipelined systems. We present a novel high-speed hybrid wave-pipelined linear feedback shift register that manages clock skew by permitting the.
bridge. mem_RAM ring_device[i] ring_device[i+1] ring_device[i-1] connected to previous DLX bridge connected to ... 16 Fitzgerald Road, Suite 300. Nepean ...
As a consequence, the possibility of an application being interrupted by a ..... build this streamline, but their design is subject to high protocol processing ..... [5] E. N. Elnozahy and J. S. Plank, âCheckpointing for peta-scale systems: A look
Performance Prediction of Throughput-Centric Pipelined Global ... - SLIP
Emerging parallel computing architectures. â¡ More stringent throughput requirement of on-chip interconnects. â« Wires in the NoCs (Networks-on-Chips) ...
Performance Prediction of Throughput-Centric Pipelined Global Interconnects with Voltage Scaling Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2 1Dept.
of ECE, 2Dept. of CSE, UC San Diego, La Jolla, CA
12th International Workshop on System Level Interconnect Prediction June 13, 2010 Anaheim, USA
Pipelining effect Voltage/ Technology scaling Design example
Conclusion 2
Wire Scaling Issues and Design Criteria
On-chip global wires become barrier for achieving
High-performance:
Low-power:
542ps (1mm wire) vs. 161ps (10 FO4 inverter) [ITRS 2008] Contribution for 50% dynamic power. [Magen 2004]
Various interconnect schemes proposed
RC wires On-chip T-lines
transceiver design, equalization, etc.
Design criteria
minimum latency
[Zhang 2009]
3
Throughput-Centric Interconnect Design
Throughput-centric interconnect design [Shah 2002] become necessary because
Increasing demand for computing capacity Emerging parallel computing architectures More stringent throughput requirement of on-chip interconnects
Wires in the NoCs (Networks-on-Chips) [Jantsch 2003]
Our work
Wires are pipelined to meet required clock period (throughput) Explore the power-saving of pipelined interconnects with more design freedoms Optimize for different applications
High-Performance / Low-Power / Moderate Cost
4
Overview of Pipelined Global Interconnects
One stage of pipelined interconnect.
Schematic of a latch-based D flip-flop.
Adopt flip-flop based pipelining structure [Heo 2005]
Flip-flop inserted to meet throughput Repeaters inserted for delay optimization
Two-stage latch-based D flip-flop Knobs for manipulating pipelined interconnects
Calculated from ITRS data or based on SPICE characterization
Define delay/energy dissipation of one repeater-wire segment 6
Assumptions and Modeling
Assumptions
Repeaters/flip-flops are inserted evenly along the wire. Repeaters/flip-flops are equally sized. The size of flip-flop is fixed and optimized for the average-sized repeater loading.
Repeated wire modeling [Zhang 2007]
Wire delay: Elmore delay
Wire energy: dynamic + leakage
α sw is the data activity. 7
Modeling of Pipelined Interconnects
Considering delay/energy overhead of flip-flops
Effective capacitance: CFF Delay of flip-flop: dFF
Performance modeling
Delay
Energy
Throughput
Observations
Throughput improved with more FFs but larger delay/energy. With the constraint of target throughput, cost of adding FFs can be minimized. 8
Voltage Scaling Modeling
Leakage current
Exponential function [Rabaey 2009]
Repeater/FF delay
alpha-power current law [Rabaey 2009]
9
Design Objectives
Min-Latency
For conventional low-latency repeated wire design Fewer FFs but larger energy/area overhead
Throughput-Centric Designs [Deodhar 2005]
Max-TPE (low-power application)
Max-TPA (high-performance application)
Optimize throughput-per-bit-energy for single pipelined wire Reduce total energy for set of parallel wires Optimize throughput-per-area for single pipelined wire Reduce total area for set of parallel wires
Max-TPEA (moderate-cost application)
Optimize throughput-per-energy-area for single pipelined wire Reduce the total power-area product for set of parallel wires
10
Performance Metrics
Throughput
Latency
Normalized latency (unit: ps/mm)
Energy per Bit
Maximum clock frequency (unit: GHz or Gbps)
Normalized energy per bit (unit: pJ/mm)
TPEA
Throughput-per-energy-area (unit: Gbps/um/pJ)
Effective pitch is defined as total area divided by wire length 11
Performance Evaluation Flow
Simply the problem by [Nagpal 2007]
Limiting the range of wire geometries (pitch, width) Optimize repeater for given wire geometry
Support different objectives FFs are added incrementally until reaching the throughput constraint Return performance metrics for given supply voltage (VDD) and pipelining stage (N) and corresponding optimal design.
We study the performance of pipelined global interconnects with voltage/process scaling for different applications.
Throughput-centric designs are introduced and compared with min-latency design:
Deeper pipelining to alleviate timing slack and therefore reduce energy/area. 20%-50% overall TPEA improvement by supply voltage scaling. Max-TPEA w/ voltage scaling can improve TPEA by 25x w/ only 4x latency overhead.