code density with mixed instruction lengths. The dual issue ... programming techniques. The APS29 is ... C++, a complete
APS29 Cortus APS29 CPU Core ALU
Co Processor
ALU
34
The dual functional units contain a single cycle ALU and multiplier/divider each, and there is one multiply-accumulate unit with a 64 bit accumulator. The load/store unit can coalesce reads and writes onto the 64 bit data bus.
Performance CoreMark/MHz 1.0 : 3.62 / GCC5.3.0 20160730 (Cortus 32b) -mcpu=aps29 -g -Ofast -fno-lto DMEM_METHOD=MEM_MALLOC -fuse-clib=minifp -mstrict-alignment mmac -DPERFORMANCE_RUN=1 / Heap
l l
DMIPS 3.09 DMIPS/MHz At least 1400 MHz in 28 nm
Implementation Results Fmax
Area
Interrupt Controller
R15
34
ACC
Timer
MACC
AXI4
RAM
Status
GPIO
AXI4
I-Cache D-Cache 64
Watchdog
64
l l l l l l l
optional
X-Bar
The APS29 is a very high performance, Flash extendible 32 bit microcontroller core DMA featuring a dual issue pipeline ensuring On-Chip Debug very high integer throughput. The dual issue pipeline provides instruction level parallelism and increases performance without any Features requirements to change coding styles or complex l Dual Issue, 5-7 Stage Pipeline compilation schemes. All the performance increases are managed within the processor core l Multiply - Accumulate and require no effort from the programmer.
l
UART
R0 = 0 optional
The APS29 from Cortus provides a solution to your performance challenges while staying within stringent power and silicon footprint budgets. A dual issue pipeline gives performance close to a dual core system with only a modest increase in silicon area and power consumption over a similar single issue core. A static branch predictor dramatically improves the performance of loops and the multiply-accumulate feature increases signal processing speeds. These features offer a significant performance boost but require no special programming techniques.
optional
advanced processing solutions
Very High Performance Embedded Microcontroller with Dual Issue Pipeline
APB Bridge AHB Bridge
2 High Performance Integer Multipliers 2 Integer Dividers Dual & Multi-Core Capable Co-Processor Interface AXI4 buses with 64 bit data width Coalesced reads/writes Optional Caches
The APS29 has been designed to offer excellent code density with mixed instruction lengths. The dual issue out-of-order completion pipeline ensures that most instructions (including loads and stores) execute in a single cycle, with two instructions being issued per cycle. A static branch predictor significantly improves the execution speed of loops.
Power
2 28 nm (TSMC) 1400 MHz 0.037 mm 17.54 µW/MHz
www.cortus.com
Near Dual Core Performance - With the Simplicity of a Single Core Dual Issue Pipeline
Cortus Version 2 Instruction Set
The dual issue pipeline enables the processor to potentially start the execution of two instructions per cycle. The processor features two single cycle ALUs, two multipliers and integer dividers, these are grouped into two functional units each capable of starting one instruction per cycle each. The load/store unit which manages data access is also handled by the pipeline.
The APS29 is based on the Cortus v2 instruction set. Following extensive analysis of a wide range of embedded programs the version 2 processor cores use a careful selection of 16, 24 and 32 bit instructions. These have been chosen to balance the size of the instruction memory against a minimal core size.
Typical Embedded Code
The pipeline manages internal resources and instruction interdependencies with no risk of conflict for the programmer to manage. The compiler schedules instructions to optimise instruction throughput. The 64 bit memory interface supplies the processor with up to four instructions per cycle which are held in the 8 word FIFO prior to execution.
APS5 APS29
Average 18% Improvement
Static Branch Predictor The APS29 features a static branch predictor that significantly improves the performance, notably in loops. A simple but effective prediction scheme is employed that ensures that in the majority of cases branches can be executed without the penalty associated with flushing the pipeline. The compiler ensures that loops and other constructs are optimised to take advantage of the branch predictor.
Multiply - Accumulate The Multiply-Accumulate unit of the APS29 offers a single cycle multiply accumulate operation into a dedicated 64 bit accumulator. Two signed or unsigned 32 bit integers are multiplied and either added into or subtracted from the accumulator.
Ecosystem The APS29 benefits from the shared ecosystem of the APS2n and APS families. It has a complete software development environment including toolchain for C and C++, a complete adapted IDE based on one of the most widely used IDEs - Eclipse. Debugging is fully supported with an integrated instruction set simulator, the Cortus onchip-debuging hardware and an Ethernet connected JTAG interface - the EtherTag. Ports of various RTOSs are available such as FreeRTOS, Micrium µC/OS, µCLinux…
Going Further If the computation performance or throughput of the APS29 is stretched by your application there are a number of possible solutions.
These operations enable the efficient implementation of a large number of signal processing algorithms such as FFT, filters etc.
Simple dual core systems can significantly increase the processing power of a system for little silicon cost. The processing power can be further increased using multicore architectures, with a coherent data cache.
The compiler is aware of the Multiply-Accumulate instructions and can optimise typical “C/C++” constructs that can be efficiently implemented with these instructions.
Equally it is easy to realise heterogeneous multiprocessor systems, for example pairing an APS29 for time critical data processing with an APS23 to handle I/O and a Bluetooth network stack.
Co-Processors
The easy integration of multiple cores enables the creation of secure systems where one processor supervises and checks the operation of the other. This effectively and reliably improves either the safety or security of an embedded system. A coherent data cache is available, supporting multi-core architectures.
In a number of cases an algorithm can be accelerated significantly with the use of a co-processor. Either implementing the entire algorithm in hardware or just key elements. The APS29 supports the easy to use Cortus coprocessor interface. This enables the engineer to extend the instruction set of the APS29, co-processor instructions suffer no penalties compared to native instructions and have full access to the register set. As with all Cortus processors that have a co-processor interface, co-processor instructions are first class instructions. The dual issue pipeline can start a coprocessor instruction in the same cycle as a native instruction, handling resource conflicts and out-of-order completion without programmer intervention.
Applications The APS29 is suited to a wide variety of applications, such as:
l l l l l l
Embedded Control Encryption and Decryption Wireless and Wireline Communication Sensor Fusion Machine Vision Dual and Multi-core Systems
[email protected] Copyright Cortus SAS © 2016