general purpose processor for a wireless engine. This system-on- chip is based on the LEON2 processor and includes the USB port connected to high-speed ...
PROC. 27th INTERNATIONAL CONFERENCE ON MICROELECTRONICS (MIEL 2010), NIŠ, SERBIA, 16-19 MAY, 2010
LEON2 Processor with High-Speed USB Port: A System-On-Chip for Wireless Applications M. Erić, G. Panić, and Z. Stamenković Abstract - The paper presents a SOC aimed to provide the general purpose processor for a wireless engine. This system-onchip is based on the LEON2 processor and includes the USB port connected to high-speed system bus as a slave. The system is implemented and verified in FPGA as a part of the development board for wireless application testing.
the APB is optimized for minimal power consumption and suits low bandwidth peripheral components. Any data that the LEON2 processor sends or receives will be through these buses. II. SYSTEM COMPONENTS
I. INTRODUCTION
The LEON2 processor [2] is highly configurable, allowing the user to customize it for a certain application (selecting different cache sizes, multiplier performance, clock generation, etc.) or target technology. It is available in form of a VHDL model describing the SPARC V8 processor core, system bus and peripheral components. New modules can easily be added using the AMBA. A graphical configuration tool based on UNIX kernel scripts is used to configure the system for implementation in Xilinx Virtex FPGA. The processor integrates both instruction and data cache memories (I-Cache and D-Cache) and corresponding cache controllers. It also includes an interface to the AMBA AHB and its controller. A memory controller is attached to the AHB. It provides an interface to an external flash memory and static RAMs. A USB port is connected as a slave to the AHB too. The slower AMBA APB is attached to the AHB via a bridge. Two UARTs, timers, I/O port and interrupt controller are connected to the APB.
Complex wireless systems require high-performance and high-speed general purpose processors and external interfaces. We have decided to adopt the IEEE standard 802.15.3 [1] for our system, because it provides QoS, various power management modes, security, ad-hoc networking, and physical layer data rates from 11 to 55 Mbit/s. Therefore, this paper proposes a system-on-chip based on the LEON2 processor [2] with USB port [3] that executes the medium access and error control (MAC) protocol of this standard. The overall SOC architecture is demonstrated in Figure 1.
A. Integer Unit The LEON2 integer unit implements the full SPARC V8 standard, including all multiply and divide instructions. It is based on a 5-stage instruction pipeline, and separate instruction and data cache interfaces. The number of register windows is configurable within the limit of the SPARC standard (2 - 32). We have decided for an inferred register file (made of flip-flops) of 8 register windows. Figure 1. Implemented SOC architecture
The system components are mutually connected through the AMBA on-chip system bus [4]. This bus architecture includes the AHB (Advance High-performance Bus) and the APB (Advance Peripheral Bus): the AHB is intended for high clock frequency system components and enables access to high bandwidth memory devices, while M. Erić was with the IHP, Im Technologiepark 25, 15236 Frankfurt (Oder), Germany during his DAAD student internship from August to October 2009 G. Panić and Z. Stamenković are with the IHP, Im Technologiepark 25, 15236 Frankfurt (Oder), Germany, E-mail: {panic, stamenkovic}@ihp-microelectronics.com 978-1-4244-7199-7/10/$26.00 © 2010 IEEE
B. Cache Sub-system Separate instruction and data caches are provided, each configurable to 1 - 64 KB, with 16-32 bytes per line. Subblocking is implemented with one valid bit per 32-bit word. The instruction cache uses streaming during line-refill to minimize refill latency. The data cache uses write-through policy and implements a double-word write-buffer. Both cache types can be configured as a direct-mapped or as a multi-set cache with associativity of 2 - 4. We have implemented a configuration consisting of a 4 KB instruction and data caches with 16 bytes per line. As associativity is one, the tag array is 24-bit wide and the data array is 32-bit wide in both caches. Two SRAM blocks
with size of 3/4 KB and 4 KB are used for implementation of the tag and data arrays. The tag arrays have been organized as 256 x 24 blocks. Also, the data arrays have been organized as 1024 x 32 blocks. C. Memory Controller The external memory bus is controlled by a programmable memory controller. The controller acts as a slave on the AHB. The function of the memory controller is programmed through memory configuration registers through the APB. The memory bus provides a direct interface to PROM, memory mapped I/O devices and asynchronous static RAM (SRAM). Chip-select decoding is done for two PROM banks, one I/O bank and five SRAM banks. Therefore, there are eight chip-select signals in the memory controller. D. Hardware Debug Support Units The LEON2 processor system includes hardware debug support to aid software debugging on target hardware. The support is provided through two modules: a debug support unit (DSU) and a debug communication link (DCL). DSU can put the processor in debug mode, allowing read/write access to all processor registers and cache memories. The DSU also contains a trace buffer which stores executed instructions or data transfers on the AMBA AHB bus. For simplicity and area saving, we have not included this buffer in the implemented configuration. The debug communications link implements a simple read/write protocol and uses standard asynchronous UART communications. The debug support unit is used to control the processor debug mode. The DSU is attached to the AHB bus as a slave, occupying a 2 MB address space. Through this address space, any AHB master can access the processor registers. The DSU control registers can be accessed at any time, while the processor registers and caches can only be accessed when the processor has entered debug mode. In debug mode, the processor pipeline is held and the processor is controlled by the DSU. The debug communication link consists of a dedicated UART connected to the AHB bus as a master. A simple communication protocol is supported to transmit access parameters and data. A link command consists of a control byte, followed by a 32-bit address and optional write data. Through the communication link, a read or write transfer can be generated to any address on the AHB bus. E. AMBA on-chip Buses Two on-chip buses are provided: AMBA AHB and APB. The APB is used to access peripherals and on-chip registers, while the AHB is used for high-speed data transfers. The full AHB/APB standard is implemented. The processor is connected to the AHB through the instruction and data cache controllers. Access conflicts between the two cache controllers are resolved locally. The
processor will perform burst transfers to fetch instruction cache lines or reading/writing data as results of double load/store instructions. Byte, half-word and word load/store instructions will perform single (non-sequential) accesses. Locked transfers are only performed on LDST and SWAP instructions. Double load/store transfers are however also guaranteed to be atomic since the arbiter will not rearbitrate the bus during burst transfers. AHB is designed for high-performance, high-clockfrequency system modules. It acts as a high-performance system backbone bus. This bus supports the efficient connection of processors, on-chip memories and off-chip external memory interfaces with low-power peripheral functions. LEON2 uses the AMBA-2.0 AHB to connect the processor cache controllers to the memory controller and other high-speed units. In our configuration, two masters are attached onto the bus: the processor and the UART of debug communication link, and three slaves are provided: the memory controller, the debug support unit and the AHB/APB bridge. The AHB/APB bridge acts as the only master on the APB. All communication between masters on the AHB and slaves on the APB pass through this bridge. The APB is optimized for minimal power consumption and reduced interface complexity to support peripheral functions. It is configured to connect five slaves: interrupt controller, timer, two UARTs, and parallel I/O port. F. Interrupt Controller The interrupt controller is used to prioritize and propagate interrupt requests from internal or external devices to the integer unit. In total 15 interrupts are handled, divided on two priority levels. G. Timer Unit The timer unit implements two 24-bit timers, one 24bit watchdog and one 10-bit shared prescaler. We do not use the watchdog. H. UARTs Two identical UARTs are used for serial communications. The UARTs support data frames with 8 data bits, one optional parity bit and one stop bit. To generate the bit-rate, each UART has a programmable 12bits clock divider. Hardware flow-control is supported through the RTSN/CTSN hand-shake signals. I.
Parallel I/O Port
A partially bit-wise-programmable 32-bit I/O port is provided on the chip. The port is split in two parts - the lower 16-bits are accessible via the PIO[15:0] signal while the upper 16-bits uses DATA[15:0] and can only be used when all areas (ROM, RAM and I/O) of the memory bus are in 8- or 16-bit mode. We have used the lower 16 bits of the I/O port that can be individually programmed as an output or input.
J.
Configuration Register
Since the LEON2 processor system is synthesized from an extensively configurable VHDL model, a configuration register (read-only) is used to indicate which options were enabled during synthesis. For each option present, the corresponding register bit is hardwired to ‘1’. K. Power-down Register The processor can be powered-down by writing an arbitrary value to the power-down register. Power-down mode will be entered on the next load or store instruction. To enter the power-down mode immediately, a store to the power-down register should be performed followed by a ‘dummy’ load. During power-down mode, the integer unit will effectively be halted. The power-down mode will be terminated (and the integer unit re-enabled) when an unmasked interrupt with higher level than the current processor interrupt level becomes pending. All other functions and peripherals operate as nominal during the power-down mode. L.
USB Port
The USB 2.0 On-The-Go Single Device Controller [3] is a dual-role device interface that meets the 2.0 revision of the USB specification and On-The-Go supplement. It handles bytes transfer autonomously and bridges USB interface to a PVCI interface. The USBHS-OTG-SD can be customized and optimized for a specific application. The design is strictly synchronous with positive-edge clocking, and no internal tri-states and a synchronous reset. The architecture of the complete USB port is presented in Figure 2.
Figure 2. USB port architecture
The USBHS-OTG-SD core itself includes several modules: • UTMI+ Interface generates control signals for UTMI+ transceiver according to the OTG CONTROL FSM state. • OTG Controller implements downstream and upstream ports. The USBHS-OTG-SD supports the host negotiation protocol and the session request protocol.
The protocol control is provided by Special Function Registers. The core can act as USB host or USB peripheral device. The ID input pin controls the default role of the USBHS-OTG-SD. If the id=1, it means that mini-B plug was connected and the core becomes Bdevice. When the id=0, it means that mini-A plug was connected and the USBHS-OTG-SD becomes A-device. • Host Controller is used when the USBHS-OTG-SD works as a USB host. Its main tasks are: generation of suspend/resume signaling and USB reset, generation of SOF tokens, USB data transactions, and generation of host interrupts. The Host Controller also contains the Host Transaction Scheduler (HTS) which analyzes how many endpoints wait for service and decides which endpoints will be serviced in a current frame. • Device Controller implements the following tasks of the USB port: USB data transactions, suspend/resume control, and generation of interrupts. • Application Interface contains the interrupt controller which generates interrupt signals for microprocessor, PVCI microprocessor interface, and Slave FIFO interface that provides direct access to the endpoint buffers. • Endpoints Logic generates read/write signals to a dualport synchronous RAM. The size of required on-chip RAM depends on the endpoints parameters such as a number, direction, size, and applied buffering scheme. • SFR Module contains a set of Special Function Registers that are used to control the USBHS-OTG-SD operation. The USB port has single reset input which is used to reset all clock domains (system, usb, and wake-up clocks). The reset controller (RSTCTRL) module generates appropriate resets for each clock domain. External logic can force a reset by pulling the reset input high. Flip-flops which are located in the RSTCTRL module are reset asynchronously. The wake-up detector (WUDET) wakes up the core from the power-saving mode with change of UTMI valid signals. The AHB wrapper (AHBWRAP) provides access to the endpoint buffers and SFRs. The endpoint buffers (OUT Buffer and IN Buffer) can be accessed by the processor at the addresses of fifoxdat registers (x is the number of endpoint). A single AHB address is assigned for each endpoint buffer. When the processor reads/writes data from/to the endpoint buffers it uses a fifoxdat address without address increment. When the processor increments the address, it is important that first burst access has the correct fifoxdat address. The OUT and IN buffers are dualport synchronous RAMs of 4 KB each. The USB core writes data into the OUT Buffer RAM and reads data from the IN Buffer RAM. The processor writes data into the IN Buffer RAM and reads data from the OUT Buffer RAM.
III. SOC IMPLEMENTATION FLOW The complete system including LEON2 processor and USB port was simulated at RTL level with the Mentor
Graphics’ ModelSim simulator [5]. A generic LEON2 testbench [2] is provided for generation of a few testbench configurations: FUNC testbench performing a quick check of most on-chip functions, MEM testbench testing all onchip memory with patterns of 0x55 and 0xAA, and FULL testbench combining memory and functional tests. Numerous simulations using these testbenches have been carried out to prove the correct functionality of the complete system-on-chip. To test the USB function of the system, we have modified the top HDL testbench of LEON2 as shown in Figure 3.
Figure 4. Data transactions on UTMI+ interface
The design has been synthesized in Virtex-II Pro FPGA using the Xilinx ISE Design Suite [6] for the operating system frequency of 60 MHz. After verification of the synthesized net-list, we have generated the necessary design files for FPGA programming and implemented the system on a Virtex-II Pro development board (Figure 5).
Figure 3. System testbench architecture
We have also written a test program in SPARC assembler that has been compiled into a bit stream and loaded into Instruction RAM. The program performs the USB initialization and tests all of the basic data transactions on UTMI+ ports. It tests the system behavior in case of USB interrupt requests too. The UTMI+ Stimulator is used to stimulate and read UTMI+ ports. It reads the utmistim.txt file containing expected events, commands, and their timing. The LEON2 processor takes the data and instructions from RAM and sends them to AHB. For sending data, the processor must initialize an endpoint (specify the type of transfer, multiple buffering, number of packets per frame for ISO endpoints, and number of endpoint in the USB peripheral device) and then write data in the fifoxdat register and arm endpoint. Figure 4 shows the most important UTMI+ interface signals in case of sending four bytes (BB, BA, B9 and B8) of the BULK type in high-speed mode. Waveforms on the right side of the figure show the start of a new frame. A transaction contains three packets: token (device address, endpoint number and transfer type), data (data to be sent) and handshake packet (acknowledge). The USB port activates the TxValid signal when became ready. Now the stimulator sends the TxReady signal for each byte that arrives on UTMI+ interface. The USB port receives the RxActive signal when data transaction is finished. Finally, it sends an interrupt request to the processor.
Figure 5. Virtex-II Pro development board (FFP Basic)
IV. CONCLUSION We have described main hardware components of the SOC for wireless application testing based on LEON2. Most of the components are extensible or configurable functional building modules, which are automatically, after choosing the parameters, generated and described in VHDL. The system is implemented and verified on a Virtex-II Pro FPGA board.
ACKNOWLEDGEMENT Our great respect and best acknowledgements go to Jiri Gaisler, the creator of the LEON processor.
REFERENCES [1] IEEE standard 802, Part 15.3: Wireless medium access control (MAC) and physical layer (PHY) specifications for high rate wireless personal area networks, 2003. [2] LEON2 Processor User’s Manual, http://www.ihpmicroelectronics.com/~stamenko/LEON2-1.0.30-xst.pdf [3] USB2.0 On-The-Go Controller, http://www.evatronix.pl/products/usb_solutions.html [4] AMBA On-Chip Bus Standard, ARM Inc., http://www.arm.com/products/solutions/AMBA.html [5] ModelSim, http://www.model.com [6] Xilinx ISE Design Suite, http://www.xilinx.com