An Embedded Hardware Architecture for GPC-on ... - Semantic Scholar

An Embedded Hardware Architecture for GPC-on-Chip Applied to Automotive Active Suspension Systems Yasser Shoukry, M. Watheq El-Kharashi Ain Shams University Computer and Systems Engineering Department Cairo, 11517, Egypt {yasser shoukry, watheq.elkharashi}@eng.asu.edu.eg

Abstract—This paper presents a hardware architecture for an embedded real-time generalized predictive control (GPC) algorithm based on the state-of-the-art Customizable Advanced Processor (CAP9) technology from Atmel; targeting automotive active suspension systems. The GPC algorithm relies on the solution of an optimization problem at every sampling period. Profiling shows that matrix operations consume the largest portion of the computation requirements of the algorithm. The proposed embedded system utilizes a systolic-array based matrix co-processor in order to accelerate matrix operations. The proposed embedded system is designed to fit within the proposed platform while meeting tight real-time constraints imposed by the automotive active suspension systems. Index Terms—Active suspension systems, generalized predictive control, embedded control systems, software-hardware codesign.

I. I NTRODUCTION Advances in automotive/aerospace industry require the application of complex closed-loop control algorithms. Due to its ability to handle constrained optimization control problems for multi-input multi-output (MIMO) systems and for its disturbance rejection properties; Model Predictive Control (MPC) [1] is used widely in the industry for safety-critical control applications, like chemical refineries, where system exhibits slow dynamics and thus required MPC computations can be done within real-time constraints. Applying a computationally intensive control algorithm (like the MPC algorithm) inside an embedded time safety-critical system represents a cyberphysical challenge. MPC refers to a class of computer closed-loop control algorithms that utilize an explicit mathematical model to predict the future response of a physical process. At each sampling interval, an MPC algorithm attempts to optimize future controller output by computing a sequence of future manipulated variable adjustments. The first input in the optimal sequence is then applied to the process, and the entire calculation is repeated at subsequent control intervals. Computational requirements of MPC is the main blocking issue for using MPC in embedded control applications. MPC-

Hesham Shokry, Sherif Hammad Mentor Graphics Egypt Heliopolis Cairo, 11471, Egypt {hesham shokry, sherif hammad}@mentor.com

on-Chip is a terminology which is first introduced in 2005 [2] as an attempt to use embedded MPC for small-physicalsize applications which exhibits a safety-critical behavior. Examples on such applications are medical equipments, automotive/aerospace and defense applications. Computational requirements of MPC exceed embedded software implementation capabilities [3], [4]. Profiling different MPC algorithms shows that matrix operations are the main consumer of its computation time [3], [4]. Enhancement of matrix operations is the main core of embedded implementation of various types of MPC. Bleris [4] enhanced matrix operations by using Logarithmic Numbering System (LNS) in order to meet the real time constraints of a medical equipment whose functionality is to deliver diabetics to blood based on the prediction of the glucose in the blood. Successful design and deployment of a matrix co-processor on the state-of-the-art CAP9 [5] technology for one of the most interesting MPC algorithms, namely Generalized Predictive Controller (GPC) is the main focus of this paper. The proposed system will be verified to obtain the required control filters within tight real-time constraints imposed by the automotive active suspension systems [6]. In the automotive active suspension system, the control algorithm should utilize a slow hydraulic actuator [6] to reject fast disturbances coming from deficiencies in the road profile. In the meanwhile, the control algorithm should minimize the energy consumed by the actuator. This contradiction between a slow actuator, fast disturbance, and minimum energy is the main idea of using GPC to control this system [7], since GPC is designed to minimize a quadratic formulation that contains both the control signal and the error between the actual system output and the required trajectory. The main challenge in utilizing GPC in this system is the required bandwidth [8]. In order to achieve acceptable disturbance rejection and energy consumption, fast sampling rate and large minimization horizons are required. This constructs a cyber-physical challenge in order to satisfy these physical requirements using the available cyber components. This paper is motivated by authors’ previous work [7]–[9]

where the automotive active suspension system is studied, effect of different tuning parameters of GPC algorithm on system response is analyzed along with a proposal of a hardware matrix coprocessor to implement GPC algorithm that meets real-time constraints. The proposed embedded hardware architecture in this paper is based on the recommendations given by [10] although it tackles the matrix co-processing in a completely different methodology. This paper is organized as follows. In Section 2, a background of the GPC along with our previous work on enhancing real-time properties of GPC for automotive active suspension systems is presented. Section 3 details the overall architecture of the proposed embedded system based on CAP9 technology. Internal details of the most important parts in the datapath and control unit are presented in Sections 4 and 5, respectively. Implementation results in terms of controller max sampling frequency is then given in Section 6. Finally, Section 7 concludes this work. II. BACKGROUND This section presents an overview of the generalized predictive control (GPC). After which a review on our previous work on accelerating the performance of GPC in order to meet real-time constraints of automotive active suspension systems is given. A. Generalized Predictive Control (GPC) GPC belongs to a class of long range model-based predictive control (MPC). GPC was introduced by Clarke and Mothidi in 1987 [11], [12]. The control objective is the minimization of a quadratic criterion involving future inputs and outputs. The minimization is done in a receding horizon sense which results into a different digital filter at each sampling period. To formalize the resulting digital filter, the used GPC algorithm has to compute the elements of an internal matrix (G) using a recursive rule, known as Diophantine equation, after which, the controller should compute the first row of the alpha matrix resulting from the following formula [11], [12]: α = −(GT G + λI)−1 GT

Fig. 1.

Steps of GPC algorithm.

B. Controlling Automotive Active Suspension Systems using GPC Automotive active suspension systems utilize electro hydraulic actuators in order to reject the vertical movements coming from deficiencies in road profiles. The wheel is connected to the unsprung mass through a spring/damper pair. The hydraulic actuator force FA is used to control the sprung mass (chassis) vibrations xs . A pure damping element exists parallel with the hydraulic actuator in order to remove shock vibrations. Active suspension systems have natural nonlinear behavior in addition to different high frequency response to actuator force. Analyzing this system, one can observe that the output is affected by two inputs which are the voltage input command v to the actuator and the road profile r. ˙ It is suitable to represent our system, as shown in Fig. 2, where r − xs is the system output, Hr and Hv are transfer functions describing dynamics between inputs and the output.

(1)

which will be used then to formulate the final digital filter. Fig. 1 shows a schematic of the GPC control strategy within a closed-loop control system. The physical process under control is characterized by its digital model B(z)/A(z). Three digital filters are used to obtain a process output equal to the input set-point signal. As the process parameters and/or non-linear terms change along the time, an online system identification technique is used to obtain new process digital model. This digital model is then given to the GPC algorithm which generates new R-S-T filters. The mathematical operations of system identification and control filter generation are performed each sampling period. As shown in Fig. 1 and (1), generating new R-S-T filters requires two matrix multiplication operations and one matrix inversion. Internal matrix (G) height and width depends on two of the tuning parameters of the GPC algorithm called prediction horizon and control horizon, respectively.

Fig. 2.

Quarter Car Model.

We applied GPC on automotive active suspension systems in a previous work [7], [8], where we used the quarter car model detailed in [6]. The modeled vibration modes and the required closed-loop bandwidth lead to the choice of a sampling period equal to 40 msec. This choice is based on Nyquist’s sampling theory. Experiments carried out, using real-time CAN-based simulation environments, aimed at finding appropriate GPC tuning parameters, show the necessity of using large values of prediction and control horizons to have a passenger comfort index within the standard acceptable thresholds. These results

require that the internal matrix (G) be in the order of 85x30 elements. The preparation of the internal matrix plus all these matrix computations should be done on the embedded target once every 40 msec. Embedded software profiling of GPC and numerical error sensitivity analysis are presented in [9]. Numerical error analysis shows that all mathematical operations shall be done using 32-bit floating point operations whereas profiling studies, carried on CAP9 platfrom, show that the preparation of the internal matrix (G) requires 15 msec, while operating other matrix operations requires more than 310 msec. The overall execution time is largely above the real-time constraint imposed by controller sampling period. Accordingly, we proposed using a systolic-array co-processor for an embedded implementation of GPC for the automotive active suspension systems. Using systolic-arrays to perform GPC operations was first proposed in [13]. This proposed architecture depends on implementing systolic arrays with a size equal to the internal matrix size. This architecture is not adequate for large GPC tuning parameters, which results into large matrix sizes. We previously proposed a linear set of systolic arrays to be used instead of a complete systolic-array processor [9]. Performance of the proposed small-sized processor depends on the chosen length of the systolic-array. The problem of embedding a systolic-array processor into a smaller-sized array is studied in [9] along with a mathematical formulation of how to choose the length of the small-sized systolic array.

IV. M ATRIX C O -P ROCESSOR Co-processor datapath consists of matrix multiplication core (shown in Fig. 4) and a matrix inversion core (shown in Fig. 5). Both cores contain a set of systolic-array cells. A systolic array structure is a specialized sort of parallel computing. It consists of a pipeline of independent cells. Each cell performs a one-click computation, save data, and then pass inputs to next cell.

Fig. 4.

Multiplication systolic array.

III. OVERALL A RCHITECTURE The proposed embedded system is based on the state-of-theart CAP9 technology. CAP9 is an ARM9-based technology, where 85% from its silicon is fixed while the remaining 15% could be customized according to application. The customizable part is implemented using Metal Programmable (MP) Blocks. Fig. 3 shows the overall architecture of the proposed system. The customizable part communicates with the main ARM9 processor through AHB bus matrix, which holds three bus masters and two slaves. The first two bus slaves are utilized to communicate data, status, and control information between the ARM9 processor and the MP-Block, whereas the first bus master is used by the debug unit. Shadow registers are used to pre-fetch data from memories, while the matrix co-processor operates on the data inside the register file. This ensures minimum number of stalls inside the datapath occurring from memory read/write cycles. A dual port RAM is used to hold the data computed on the ARM9 processor (the internal matrix G). It is also used to hold the final data computed by the co-processor (alpha matrix). A single port RAM is used for holding internal computations. The algorithm used in matrix inversion (known as Givens Rotations [14]) decomposes each single matrix value into two 32-bit numbers (known as w, v). In order to keep memory transactions to minimum, both u and v are stored inside the same memory address.

Fig. 5.

Inversion systolic array.

A matrix multiplication systolic array consists of a set of Multiply-And-Accumulate (MAC) cells. Data propagates from top left most cells towards right most elements. Signals between cells hold the input matrix elements. Matrix inversion core uses LQ decomposition which is more convenient for GPC calculations. Data propagates from downright-most element towards the left. LQ decomposition based on the Givens Rotation algorithm utilizes two different cell types. The first one (called Boundary Cell (B-Cell)) is used to calculate rotation angles which leads to annihilate one matrix element in the upper half of the matrix based on the following equations [15]:

Fig. 3.

Fig. 6.

Overall architecture of the proposed embedded system.

Schematic of multiplier cell.

u ¯ = uk + wvk v

(2)

w ¯ = wuk /¯ uk

(3)

c = vk /uk

(4)

where the input for this B-Cell is a matrix element from the upper half (represented by a wk ,vk pair) which is required to calculate new w, ¯ which will be used to update the w value of all other (w,v) pairs on the same matrix row. It is also required to calculate the rotation angle c, which will be used along with the upper half matrix element vk to update all v values of other matrix elements on the same row. Internal cells (I-Cell) utilize the calculated w ¯ and c values to do the required

Fig. 7.

Schematic of B-cell.

updates for other matrix elements based on equations: u ¯ = u + wvk v

(5)

v¯ = v − cu

(6)

Hardware schematics of the multiplier cell, B-Cell and I-Cell

Fig. 9.

Fig. 8.

Schematic of Internal cell.

are shown in Fig. 6, Fig. 7 and Fig. 8 respectively. V. C ONTROL UNIT Fig. 9 shows a hierarchical control unit structure used in the proposed design. Three levels of control exits namely algorithm operations, matrix operations, and cell operations. Communication between these levels is in the form of start/ready signals. This architecture facilitates the change between different MPC algorithms. Single-Port (SP) and Dual-Port (DP) RAMs are used to hold matrix elements. Internal operations require three different matrices. Each matrix can be held in either the SP RAM, the DP RAM, or distributed along both. To partition the problem of getting the right matrix element from RAM; two layers of memory controllers are used. The top layer is responsible for decoding the row and column numbers along with the required matrix number in order to get the right memory address and then initiate either the SP RAM or DP RAM controller. Accordingly, control units are hidden from the internal details of memory architecture. DP-RAM is also used to seperate the two frequency domains presented inside the MP-Block. The first frequency domain utilizes the 20MHz clock used inside the AHB bus matrix. Internal datapath utilizes a higher frequency in order to obtain the required real-time constraints. VI. I MPLEMENTATION R ESULTS Successful deployment of the proposed system is done on the CAP9 emulation platform, where the MP-block is emulated on a Virtex StartixII FPGA. Hardware resources for different

Control unit used in the proposed embedded system.

cell types are shown in Table I. The maximum frequency is affected by the placement of the RAM blocks inside the FPGA. In order to work against the sparse-distribution of RAM blocks along the FPGA, the memory control logic is duplicated in order to decrease the critical path and ensure the hold and slack time. This logic duplication increases the clock frequency from 52MHz to 74MHz. Although the datapath itself can work on a frequency up to 100MHz, long routing paths inside the FPGA decreased the overall clock frequency. TABLE I H ARDWARE RESOURCES AND CYCLES REQUIRED FOR EACH TYPE OF CELLS IN THE PROPOSED MATRIX CO - PROCESSOR . Cell type Multiplier Cell B-Cell I-Cell

Combinational LUTs 1537 4579 3977

Logic Registers 1141 3490 2972

Number of cycles 19 40 39

TABLE II C OMAPRISON OF PROFILING RESULTS FOR GPC CONTROLLER WORKING AS EMBEDDED SOFTWARE AND EMEBDDED SOFTWARE + HARDWARE . Operation Solving Diophantine equation Matrix multiplication (occurs twice) Matrix inversion Formalizing digital filter Total execution time

Software 14 107 143 1 310

msec msec msec msec msec

Software + Hardware 14 msec 5 msec 12 msec 1 msec 37 msec

The main result obtained by this work can be presented in terms of meeting real-time results imposed by using long prediction and control horizons in order to increase the performance of safety-critical systems. For instance, considering the automotive active suspension systems presented in Section II, a prediction horizon equals to 80 and a control horizon equals to 35 is required [7], [8]. Using the proposed system, the GPC algorithms requires 37 msec to control the process, where it takes more than 310 msecs using embedded software running on the same ARM9 target used in this platform. Table II shows

detailed profiling results of every step in the GPC algorithm for embedded software versus the proposed software and hardware implementation. VII. C ONCLUSION In this paper, a proposed systolic-array based co-processor based on CAP9 technology is presented. The proposed system is then verified to achieve real-time constraints imposed by using GPC in a time-safety-critical automotive application. Hardware utilization is achieved by merging the arithmetic units. Different aspects of the design and implementation of the system is presented. Utilizing the proposed embedded system in acquiring real-time results from controlling the automotive active suspension systems or any other safety-timecritical embedded system is the next step. R EFERENCES [1] E. Camacho and C. Bordons, Model Predicitve Control. Berlin, Germany: Springer Verlag, 2004. [2] M. He and K. Ling, “Model predicitve control on a chip,” in Proceedings of International Conference on Control and Automation (ICCA) 2005, Budapest, Hungary, Jun. 26–29 2005, pp. 528–532. [3] L. G. Bleris and M. V. Kothare, “Real-time implementation of model predicitve control,” in Proceedings of American Control Conference (ACC) 2005, Portland, Oregon, USA, Jun. 8–10 2005, pp. 4166–4171. [4] L. G. Bleris, M. V. Kothare, J. G. Garcia, and M. G. Arnold, “Towards embedded model predictive control for system-on-a-chip applications,” Journal of Process Control, vol. 16, no. 3, pp. 255–264, Mar 2006. [5] Atmel Corporation. CAP9 product manual. [Online]. Available: http://www.atmel.com/products/AT91CAP/ [6] M. D. Donahue, “Implementation of an active suspension, preview controller for improved ride comfort,” Master’s thesis, The Universirty of California at Berkeley, Berkely, California, USA, April 2001. [7] Y. Shoukry, M. El-Shafey, and S. Hammad, “Networked embedded generalized predictive controller for active suspension system,” in Proceedings of American Control Conference (ACC) 2010, Baltimore, Maryland, USA, Jun. 29–Jul. 2 2010, pp. 4510–4575. [8] Y. Shoukry, M. W. El-Kharashi, M. El-Shafey, and S. Hammad, “Towards real-time networked embedded generalized predictive control for automotive active suspension system,” in Proceedings of IFAC symposium on Advances in Automotive Technology (IFAC AAC) 2010, Munich, Germany, Jul. 12–14 2010. [9] Y. Shoukry, M. W. El-Kharashi, and S. Hammad, “MPC-on-Chip: An embedded gpc co-processor for automotive active suspension system,” IEEE Embedded Systems Letters, vol. 2, no. 2, pp. 31–34, Jun 2010. [10] P. Vouzis, M. Kothare, L. Bleris, and M. Arnold, “A system-on-achip implementation for embedded real-time model predictive control,” Control Systems Technology, IEEE Transactions on, vol. 17, no. 5, pp. 1006 –1017, 2009. [11] D. W. Clarke, C. Mohtadi, and P. C. Tuffs, “Generalized predictive control-part 1: The basic algorithm,” Automatica, vol. 23, no. 2, pp. 137–148, 1987. [12] D. W. Clarke, C. Mohtadi, and P. C. Tuffs, “Generalized predictive control-part 2: The basic algorithm,” Automatica,, vol. 23, no. 2, pp. 149–160, 1987. [13] K. Karagianni, T. Chronopoulos, A. Tzes, N. Koussoulas, and T. Stouraitis, “Efficient processor arrays for the implementation of the generalised predictive-control algorithm,” Proceedings of IEE - Control Theory and Application, vol. 145, no. 1, pp. 47–54, 1998. [14] R. Dohler, “Squared givens rotation,” IMA Journal of Numerical Analysis, vol. 11, pp. 1–5, 1991. [15] M. Karkooti, J. R. Cavallaro, and C. Dick, “FPGA implementation of matrix inversion using QRD-RLS algorithm,” in Proceedings of Asilomar Conference on Signals, Systems and Computers 2005, Asilomar Grounds, Pacific Grove, California, USA, Oct. 28–Nov. 1 2005, pp. 1625–1629.

An Embedded Hardware Architecture for GPC-on ... - Semantic Scholar

An Embedded Hardware Architecture for GPC-on ... - Semantic Scholar

Suggest Documents

An Embedded Reconfigurable Architecture for ... - Semantic Scholar

embedded hardware architecture for statistical rain ...

Hardware Abstraction Architecture - Semantic Scholar

Novel Reconfigurable Hardware Architecture for ... - Semantic Scholar

MIRO: An Embedded Distributed Architecture for ... - Semantic Scholar

Design and Architecture for an Embedded 32-bit ... - Semantic Scholar

An Embedded Hardware-Software System to ... - Semantic Scholar

Hardware Support for Real-Time Embedded ... - Semantic Scholar

The Architecture of an Embedded Smart Camera ... - Semantic Scholar

Hardware/compiler codevelopment for an ... - Semantic Scholar

Hardware and Software Architecture for Embedded ... - Springer Link

Architecture for hardware driven image inspection ... - Semantic Scholar

A Scalable Hardware Design Architecture for ... - Semantic Scholar

Bus Architecture Synthesis for Hardware-Software ... - Semantic Scholar

A Software-Hardware Architecture for Self ... - Semantic Scholar

Novel Hardware Architecture for Implementing the ... - Semantic Scholar

VLSI hardware architecture for complex fuzzy ... - Semantic Scholar

Hardware Architecture for a Message Hiding ... - Semantic Scholar

A Hardware Architecture for Dynamic Performance ... - Semantic Scholar

An architecture pattern for embedded systems autonomy

An Embedded-Agent Architecture for Online

A co-designed hardware/software architecture for ... - Semantic Scholar

miro: a distributed embedded architecture for ... - Semantic Scholar

Embedded Real-Time Software Architecture for ... - Semantic Scholar