1
Fast Linear Model Predictive Control via Custom Integrated Circuit Architecture A. G. Wills, G. Knagge, B. Ninness
Abstract— This paper addresses the implementation of linear model predictive control (MPC) at millisecond range, or faster, sampling rates. This is achieved by designing a custom integrated circuit architecture that is specifically targeted to the MPC problem. As opposed to the more usual approach using a generic serial architecture processor, the design here is implemented using a field programmable gate array (FPGA) and employs parallelism, pipelining and specialized numerical formats. The performance of this approach is profiled via the control of a 14th order resonant structure with 12 sample prediction horizon at 200µs sampling rate. The results indicate that no more than 30µs are required to compute the control action. A feasibility study indicates that the design can also be implemented in 130nm CMOS technology, with a core area of 2.5mm2 . These results illustrate the feasibility of MPC for reasonably complex systems, using relatively cheap, small, and low-power computing hardware. Index Terms— Model Predictive Control, Field Programmable Gate Array (FPGA), Very Large Scale Integration (VLSI)
I. I NTRODUCTION Model predictive control (MPC) has emerged as a highly attractive control strategy due to its ability to explicitly accommodate all of the multivariable (multiple-input, multipleoutput) and non-linear aspects of a system in the subsequent control signal design [17], [22], [20], [16]. However, a fundamental difficulty of this MPC approach is that the implementation platform must be capable of solving a constrained optimization problem within the sampling period, which decreases as the speed of the dynamics to be controlled increases. As a result, the implementation of MPC has generally been limited to plants with slow, or otherwise very simple dynamics so that the time constraints in computing a solution are relaxed [16]. Surmounting this difficulty of computational overhead, to achieve the benefits of MPC for systems with more complex dynamics, has recently attracted significant research attention. In particular, special sessions have been devoted to the topic at major international control conferences [1], [2], [3]. This has resulted in the examination of a number of solution strategies. One particularly well studied approach, known as “explicit MPC”, involves offline computation of a set of piecewise affine control laws, together with the polyhedral regions in which each applies [24], [13]. However, this suffers from a key disadvantage. Namely, the calculation time and memory requirements to compute and store the regions and controllers grows exponentially with the number of states, control variables, and constraints. This limits the applicability of explicit MPC to relatively small dimension problems [29], [13]. This work was supported by the Australian Research Council. All authors are with the School of Electrical Engineering and Computer Science, University of Newcastle, NSW Australia. Corresponding author is Adrian Wills. Email:
[email protected].
Another approach to expanding the applicability of MPC has involved careful design of the optimisation algorithm for online calculation. For example, the recent work [29] examines the efficacy of exploiting sparsity and structure, “warm starting” and early quadratic program termination (among others). Impressive speed dividends are reported via simulation study on a generic serial architecture processor (3 GHz AMD Athlon). While this paper also solves the optimisation problem online, the essential innovation here is to examine the utility of designing a custom architecture for MPC implementation in integrated circuits. This approach is feasible due to the relatively recent availability of high capability field programmable gate arrays (FPGA’s) with modest cost, size and power burdens [26], [4], [25]. Motivated by the potential of this approach, other researchers have reported impressive results for solving quadratic programs (QP’s) using custom FPGA implemented architectures [15], [5], [14], [28]. Recognising this, the work here implements a full MPC solution that includes, inter alia, data acquisition, state estimation via an observer, and QP solution. The performance of this MPC solution is profiled on a real physical plant involving vibration control of a 14th order resonant flexible structure, using piezo transducers. The control is implemented at 200µs sample interval with prediction horizons up to 12 samples. Extensive testing provides evidence that in fact only 30µs are required to compute the control action. The performance reported here is achieved with an architecture implemented on an Altera Stratix III EPSL150F115C2 FPGA, clocked at 70 MHz. The design employs some standard computer engineering techniques such as parallelization and pipelining. However, a key optimisation of the design is the implementation of floating point calculation units with tuned bit-widths. Initial results indicate that the custom architecture design profiled here could be implemented on a 130nm CMOS technology integrated circuit, using a core area of only 2.5mm2 . This provides evidence that the advantages of MPC are potentially available in many portable, low-power, embedded, massproduced and high bandwidth applications that have previously only been amenable to less sophisticated control approaches. II. P ROBLEM F ORMULATION This paper considers the control of linear, time invariant, discrete time systems, which are modelled in state-space form, according to xt+1 yt
= Axt + But + wt , = Cxt + Dut + vt ,
(1) (2)
Here ut ∈ Rm is the system input (control variable), yt ∈ Rp is the system output, xt ∈ Rn is the state variable, and wt ∈
2
Rn and vt ∈ Rp are unknown disturbances on the state and output, respectively.
In what follows in section IV-D, a prediction ybt+k|t of future system responses will also be required. This can be derived from the predicted state in the obvious fashion
A. Model Predictive Control The model predictive control (MPC) approach considered here delivers the control variable ut+1|t by solving at time t a constrained optimisation problem of the form
ybt+k|t = C x bt+k|t + Dut+k|t .
u?t = arg min V (ut , xt+1 ) s.t. Γut ≤ b(xt+1 ) ut
(3)
where, for some user chosen prediction horizon N , ut , [ut+1|t , ut+2|t , · · · , ut+N |t ].
(4)
In the above, the subscript t + k|t is used to denote a future control action at time t + k, which is based on measurements at time t. MPC operates by using the first element u?t+1|t of u∗t as the control action to be applied at the next time interval. It then moves on to the next time instant, t + 1, and solves (3) again to deliver u?t+2|t+1 as the first element of u∗t+1 , and so on. Accordingly, there is delay of one sample between measurement and control action that is intrinsic to the MPC approach (3) - see [16, Section 2.5] for further discussion of this point. In this paper, the control cost V (ut , xt+1 ) is assumed to have the following quadratic form V (ut , xt+1 ) , uTt Hut + uTt f (xt+1 )
(5)
where the matrix H ∈ RN m×N m is assumed to be positive definite, and f (xt+1 ) : Rn → RN m is assumed to have the affine form f (xt+1 ) = f0 + Φxt+1
(6)
N m×n
for some user chosen matrix Φ ∈ R . The constraints in (3) also involve a user chosen matrix Γ ∈ RM ×N m together with a further affine mapping b(xt+1 ) : Rn → RM of the form b(xt+1 ) = b0 + Ψxt+1
(7)
for some user chosen matrix Ψ ∈ RM ×n . The utility of the formulation (5)-(7), together with details on how H,Γ,Φ,Ψ,b0 and f0 may be specified, will be illustrated in detail in section IV by way of an application example. Importantly, this MPC approach requires knowledge of the future system state xt+k , for k = 1, . . . , N . It must therefore be predicted based on information available at the present time t, and these predictions substituted for xt+k . We will denote this prediction as x bt+k|t . When k = 1, the predictor employed here will be of the linear observer form x bt+1|t et
= Ab xt|t−1 + Bu?t|t−1 + Let , = yt − C x bt|t−1 − Du?t|t−1 .
(8) (9)
The observer gain matrix L is user-specified. In Section IV, it will be selected as the steady state Kalman filter gain. This is a common choice due to the predicted state then being of minimum variance. This minimum variance property can be preserved for k > 1 by employing the predictor x bt+k|t = Ab xt+k−1|t + But+k|t ,
for k > 1
(10)
which is “initialised” for k = 2 by using the predictor (8),(9) in the right hand side of (10).
(11)
B. Computing the Control Action The MPC approach described above requires the solution of the constrained optimisation problem (3). Via the cost specification (5), this is in the form of a strictly convex quadratic program that may be effectively solved using standard algorithms [32]. The two dominant approaches in this context are so-called “active-set” and “interior-point” methods. This paper elects to employ an active-set technique that is based on the work of [9], [21] and [23], and the software tools [32]. Our rationale for this involves the attractive features of the method, including its ability to handle constraints of the form specified in (3), its ability to start from an infeasible initial point, and its success in other application areas [23], [30]. We refer the reader to the recent work [14] for a detailed discussion of the relative merits of active-set versus interiorpoint approaches from the perspective of efficient and fast embedded implementation. The essentials of this active set approach will prove necessary in order to elucidate our custom architecture MPC design in subsequent sections, and hence are now briefly presented. The algorithm starts at the the unconstrained minimum u0 = −H −1 f of (5), at which point some of the constraints Γu ≤ b may be in violation, i.e. there may be one or more integers i such that Γi u > bi , where Γi denotes the i’th row of Γ, and bi denotes the i’th element of b. For the purposes of explanation, assume that constraint i is in violation. The algorithm then computes a new vector u1 = u0 +p0 that satisfies the i’th constraint with equality, i.e. Γi u1 = bi . This is achieved by solving an equality constrained QP (ECQP) for p0 , i.e. p0 = arg minp (u + p)T H(u + p) + f T u + p subject to Γi (u + p) = bi . Associated with this solution is a Lagrange multiplier λ, and a property of the dual active-set method [9], [21] is that it must be strictly positive, i.e. λ > 0. The resulting u1 is guaranteed to satisfy the violated constraint, but there may be others that remain in violation. If there are, then a new ECQP problem can be solved, where both the old and new constraints form the set of equalities that must be satisfied at the solution. In general, the list of constraints that must be satisfied with equality is known as the active-set. This list will grow as the algorithm progresses and more constraints are added to it. It may also be necessary to drop constraints from the active-set in order to maintain strict dual feasibility (i.e. the associated Lagrange multipliers λ for the active constraints must be positive). The algorithm terminates when there are no more constraints in violation, or when it determines that the problem is infeasible. These ideas are made more precise in the following definition of Algorithm II.1, which details the dual active-set method used in this paper. It is intended to establish the numerical operations and control structures which are required. These are considered further in Section III in the context of a hardware implementation. For further details of the underlying
3
dual active-set algorithm itself, the interested reader is referred to [9], [21]. In what follows, a set of active-constraint indices is maintained, denoted by A. This is a set of positive integers that indicate which constraints are currently active, i.e. the integers i such that Γi u = bi . Further, the notation ΓA ∈ Rna ×N m is used to refer to the matrix whose rows Γi correspond to the integers i ∈ A, where na is used to denote the number of active constraints.
Algorithm II.1 : Dual Active-Set Algorithm based on [9], [21], [23], [32] Given H, Γ and xt+1 , define f , f (xt+1 ) and b , b(xt+1 ) and perform the following steps: 1) Set the initial solution u = −H −1 f ; 2) Set the number of active constraints na = 0, and let A = ∅ denote the set of active constraint indices, and let I , {1, . . . , M } denote the set of possible constraint indices; 3) If there are no violated constraints then STOP, the algorithm has terminated successfully; 4) Otherwise, choose a violated constraint index p from the the set of inactive constraints I\A and set the column vector γ = ΓTp ; 5) Set the variable σ = 0 (used to accumulate the new Lagrange multiplier for the violated constraint); 6) If na < N m η=
bp − γ T u γ T H −1 − H −1 ΓTA (ΓA H −1 ΓTA )−1 ΓA H −1 γ
dλ = −(ΓA H −1 ΓTA )−1 ΓA H −1 γη du = −H −1 (ΓTA dλ + γη) Otherwise η = 1.0,
−1 T dλ = −(ΓA ) γ,
du = 0
7) Find the maximum step length β such that Lagrange multipliers remain positive, and record the index ` for which this occurs (if it does), via 1 n o : na = 0 −λ(i) min 1, : d (i) < 0 : 0 < na < N m i λ β= n dλ (i) o mini −λ(i) : dλ (i) < 0 : na = N m dλ (i)
8) If na = N m and β is not defined from the above step (i.e. there are no entries dλ (i) < 0 for any i), then the problem is INFEASIBLE; 9) If 0 < na < N m and β = 1 then goto Step 11, otherwise set λ = λ + βdλ ,
σ = σ + βη
11) Update the control action sequence and the Lagrange multipliers via u = u + βdu ,
λ = λ + βdλ ,
λ(na + 1) = σ + η
12) Add the index p (from Step 3) to the active-set so that A ← {A, p}, and increase the number of active constraints by one, na ← na + 1; 13) Goto Step 3. In the seminal work [9] it has been recognised that a vitally important consideration is the efficient computation of η, dλ and du . In the same paper it is shown that this may be achieved by maintaining two matrices: the first is here denoted as Z ∈ RN m×N m which has the property ZZ T = H −1 .
(12)
The second is an upper triangular matrix R ∈ Rna ×na that satisfies R (13) Z T ΓTA = ∅(N m−na )×na where the notation ∅a×b denotes an a × b matrix with all entries equal to zero. The matrices Z and R need to be updated whenever the active-set A is changed so that properties (12) and (13) are preserved. The work reported here adopts this framework and employs Givens rotations [10] to robustly and efficiently compute that necessary updates. While a detailed description of this is beyond the focus of the current paper, it is important for understanding the proposed implementation to present the essential idea. Therefore, suppose that Z and R already satisfy (12) and (13), and that a new constraint i is being added to the active-set. Then R r1 T T T Z [ΓA Γi ] = (14) ∅(N m−na )×na r2 where r1 ∈ Rna and r2 ∈ RN m−na . The idea is to apply a series of Given’s matrices Q1 , . . . , QN m−na −1 to rotate the vector r2 so that all its elements are zero, except for the first one. For the sake of exposition, let us denote this first element as a. That is QN m−na −1 , . . . , Q1 Z T [ΓTA ΓTi ] R = QN m−na −1 , . . . , Q1 ∅N m−na ×na R+ = ∅N m−na −1×na R r1 + R , ∅1×na a
r1 r2
Note that the new upper triangular matrix R+ satisfies property (13). Further, by dint of their orthogonality, applying the Given’s matrices to Z T does not effect the property in (12).
and if na < N m then update the control sequence u = u + βdu 10) Drop the `’th constraint from the active-set so that A ← A\A(`) and remove the `’th Lagrange multiplier λ(`). Finally, reduce by one the number of active constraints na ← na − 1, and, goto Step 6;
III. MPC H ARDWARE S OLUTION This section details the custom architecture developed to implement the MPC approach discussed in the previous section. Key features of this architecture include: • It provides a complete controller implementation, including state estimation;
4
All calculations are performed using a custom floating point number representation; • Tri-diagonal matrix structures are exploited; • Parallel computation is employed wherever possible, and in particular when computing inner products and Given’s rotations. The rationale and details underlying these and other aspects of the custom architecture design are now detailed.
The use of a fixed point format was considered. However, the floating point format allows for a larger dynamic range to be represented with a minimal number of bits. Furthermore, while floating point systems require additional complexity for the rescaling of the mantissa to the [1..2) range after an operation, this is not dissimilar in complexity to the range checking and shifting required in generalised components of a fixed point system.
A. Overview A high-level layout of the custom hardware solution profiled here is shown in Figure 1. It consists of an Altera Stratix III EPSL150F115C2 FPGA, in which the custom control architecture is implemented. This is interfaced to external A/D and D/A circuits in order to measure system responses and output control actions. Furthermore, the FPGA is connected to a serial interface solely for the purposes of initial configuration, and for monitoring performance of the controller implemented in the FPGA.
C. MPC Circuit Structure We now address the high level structure of the custom architecture MPC solution implemented via FPGA, which is illustrated in Figure 2. An essential feature is that the state observer is separated from the QP solver. This is in order to implement the two components with different numerical precision. The rationale for this is that extensive simulation established that the dynamic range and precision necessary for observer stability required a scope of nm ∈ [11, 15] mantissa bits for the floating point representation (15), while the QP solver required less than this. Importantly, the state vector x bt+1|t is required to form f (b xt+1|t ) via (6) and the constraint vector b(b xt+1|t ) via (7) before the QP solver can begin operation. Hence, the high precision observer circuit could potentially become a computational bottleneck. However, the design here separates the state update (8) from the calculation of f (·) and b(·) by exploiting the fact that (a similar argument holds for b(·))
•
Stratix III FPGA Top level entity PLL
interface logic
MPC Circuit
uart0
RS232 command link for initialisation
10-bit data to DAC 10-bit data from ADC
ADC board
control action to beam (0-2.5V) reading from beam (0-5V)
f (b xt+1|t ) = f0 + Φb xt+1|t = f0 + Φ(A − LC)b xt|t−1 ? + Φ(B − LD)ut|t−1 + ΦLyt
uart1 optional RS232 status/data feed from circuit
Monitoring PC
Fig. 1. Block Diagram of FPGA-based custom architecture model predictive controller.
B. Custom Numerical Format An important flexibility afforded by an FPGA customarchitecture approach is the opportunity to employ custom numerical formats. This is an important advantage because arithmetic circuit size, power consumption, and computation time, all decrease as the number of bits used in number representations fall. The work here exploits this opportunity by employing a custom floating point format, in which a quantity z is represented according to z = (−1)s × m × 2x
(15)
where s is sign bit, m is the nm -bit unsigned mantissa and x is the nx -bit exponent. If a standard microprocessor is employed using IEEE 754 32-bit single precision floating point computation, then the associated circuit involves nm = 23 bits to represent m and nx = 8 bits for x. In what follows, we will illustrate that high quality control can be achieved with far fewer bits than this. More specifically, while the number of exponent bits will be fixed at nx = 7 in what follows, the number nm of mantissa bits will be in the range nm ∈ [7, 15].
(16) (17)
Therefore, f (b xt+1|t ) can be computed solely on basis of the previous state estimate x bt|t−1 . The significance is that f (·) and b(·) can be computed in parallel to the state update in (8) and importantly, they can both be computed using lower precision floating point arithmetic. The latter observation came from extensive simulation studies where it was confirmed that small errors in f and b did not significantly affect the QP result. Additional components illustrated in Figure 2 include converter logic for translating the raw external data to and from the internal floating point representations.
State Observer start
Fig. 2.
state vector
QP Solver
samplein
input Converter
inputConverted start_filter_qp_relay
control action
output Converter
sampledone sampleout
High level MPC architecture representation.
D. State Observer The implementation of the observer described here assumes that the state transition matrix A in (1), (8) be specified in tri-diagonal form. Importantly, this is always achievable via a straightforward similarity transform [19], and hence in no way limits the class of systems. The rationale for the tri-diagonal transformation is the dividend it yields of much decreased storage and computation
5
requirement. This arises since the number of non-zero state transition matrix elements is 3n−2 in the tri-diagonal situation, compared with possibly n2 in the non-transformed case. With this tri-diagonal form, the i’th element x bit+ 1 of the state update may be computed as min(N,i+1) N X X x bit+1 = Ai,j x bjt + Bi ut + Li et (18) i=1
j=max(i−1,1)
memIndexer
State Machine Controller
RMI Controller
candidate Finder ulhandler
data1
mem1
Since the results of these calculations are not required until the next sample is processed by the QP Solver, speed is not a priority and the observer circuit can be implemented with minimal hardware to minimize the impact of the high precision requirement, on circuit size and power usage. Such an implementation is shown in Figure 3, using only one 16bit multiplier instance and a carry-save accumulator. These are provided data from a memory block and are controlled by a simple state machine.
1/x array A
data2
mem2
other vectors and registers
B multArray1 A*B + C C givens Array
memRMI A Key
start
datain
uin
State Machine
single value multivalue bus control signals multiplexor
finished
Memory uval
eval
MultiplyTally
oldstatevector
B multArray2 A*B + C C
1/x Array
stateout
Key single value multivalue bus
Fig. 4.
Architecture of the QP solver circuit.
Register multiplexor
statein
Fig. 3.
Architecture of state observer implementation.
Data is routed via multiplexors to the multiplier inputs, from a minimal set of registers, and a memory block. The memory contains the constant system matrices and observer gain, and its contents are optimally arranged in a manner that simplifies the control logic.
E. QP Solver Circuit The bulk of the computational load in implementing MPC lies in solving the quadratic program (QP) defined in (3). Consequently, the design profiled here has concentrated on maximising the speed of the circuit implementing the QP solver, which is shown in block diagram form in Figure 4. As illustrated, it consists of a state machine controller that is configured to reuse a number of special purpose arithmetic components. In the interests of speed, a large degree of parallelism is employed. This comes at the expense of increased circuit complexity. An important feature of this architecture is that it is entirely scalable. For the proof of concept implementations discussed here, a configuration has been chosen that minimizes the processing time, while maintaining feasible circuit complexity. However, the number of elements in the parallel arrays may be decreased to reduce the circuit size, at the expense of processing time. Alternatively, the number of parallel elements may be increased to handle larger dimension systems, resulting in a larger circuit. The key elements within this design are now explained in more detail.
1) Multiplier Arrays: Two arrays of floating point multipliers are used to perform the bulk of the arithmetic, with a key element being the use of parallelism to reduce execution time. The multiplier array is reconfigurable, and can operate as 12 individual multipliers with an offset, as six parallel 2-element inner product calculators, or as a single 12-element inner product with an optional offset. To reduce latency and avoid unnecessary computation, the multiplier results are retained in carry-save format until the final sum is obtained. 2) Givens Array and QR Factorisation: As discussed in Section II-B, the proposed MPC control algorithm studied here employs an active-set method for QP solution. In turn, this requires the computation of a QR factorisation, which can be achieved by repeated application of Givens rotations or Householder reflections [10]. Generally, Householder reflections are suited to the case of zeroing many entries in a vector, while Givens rotations are more suited to zeroing out selected entries. Fast QR factorisation in hardware has been explored in the literature, with a broad range of implementation features. Many implementations are based on the Coordinate Rotation DIgital Computer (CORDIC) algorithm (see e.g. [27]) for implementing both Givens rotations [7], [11] and Householder reflections [12]. The design described in this work employs decoupled Givens rotations, as discussed in [6]. This is chosen since it allows for the square root and division operations to be minimized by decoupling the numerator and denominator calculations and applying a scaling factor to ensure numerical stability. To be more precise, the calculation of a Given’s rotation of
6
[a, b]T results in a Given’s matrix " # G=
√ b a2 +b2 √ a a2 +b2
√ a a2 +b2 √ −b a2 +b2
a r so that G = . (19) b 0
However, by isolating the denominator terms it is possible to express this as [6] ˆ 0 a b ˆ ˆ= ˆ = K1,1 ˆ −1/2 G, G , K G=K ˆ 2,2 −b a 0 K ˆ 2,2 = a2 + b2 . ˆ 1,1 = a2 + b2 K (20) K This significance of this, is that the above expressions involve only multiplication and addition operations, which may be computed relatively quickly in a hardware implementation [6]. To obtain the original value of G, reciprocal square roots ˆ −1/2 , however it is not need to computed in order to find K necessary to perform this operation immediately since a −1/2 c −1/2 ˆ a ˆ ˆ G =K (21) G =K b 0 b Therefore, with some care, it is possible to “save up” the ˆ −1/2 for a later time. Furthermore, in scaling operations K the active-set algorithm, this scaling is employed as part of a ˆ −1/2 )−1 which ˆ −1/2 K back substitution step which requires (K ˆ is equal to K. Hence, square-root circuits are not required in the active-set implementation. 3) Parallel Computation of Givens Rotations: An important feature of the design is the introduction of parallelism in the computation of Givens rotations. To explain this, Figure 5 is a graphical representation of a standard serial approach, which is the only one possible on a standard architecture processor. In this situation, pairs of cells are chosen for Givens rotations and one result depends on the result from the previous calculation. Combined with the post-rotation adjustment of affected entries, completing this serial chain represents a significant proportion of the processing time requirements.
Rotate
Rotate 0
Rotate 0 0
Rotate 0 0 0
Rotate 0 0 0 0
Rotate 0 0 0 0 0
Rotate 0 0 0 0 0 0
0 0 0 0 0 0 0
Fig. 5. Diagrammatic representation of Givens rotation on standard serial computer architecture.
The proposed architecture exploits the fact that, by careful selection of rotation points, multiple rotations may be performed in parallel. This is illustrated diagrammatically in Figure 6. Via this, a column rotation, requiring n steps on a serial processor, only requires log2 n steps in the custom architecture design reported in this paper. Post-rotation adjustment of the affected matrices may also be performed in parallel to subsequent rotation operations. Rotate Rotate Rotate Rotate
Fig. 6. parallel.
0
Rotate
0 0 0
Rotate
0 0 0
0 0 0
Rotate
0 0 0 0 0 0 0
Diagrammatic representation of employing Given’s rotations in
In Figure 4, showing the implementation of these principles. The “Givens Array” unit implements an array of dedicated
calculation elements that perform the scaled and decoupled Givens rotations. Multiplexors are used to select the appropriate pairs of data elements for each rotation, and also implement the necessary matrix permutations. 4) Inverse of Constraints Matrix: Computing the QP solution requires calculation of the inverse of the upper triangular matrix R in (13). Rather than perform a time consuming, nonparallelisable, back-substitution operation for this purpose, a secondary state machine (the “RMIController” in Figure 4) is used to maintain the necessary inverse. The data required to perform this calculation is available on the prior iteration. Hence this operation can minimize impact on other parts of the circuit by operating in parallel, and at a time when the second multiplier array is otherwise unused. When the data is required, this inverse is then submitted to the multiplier arrays to perform a matrix operation using fast parallel inner products. Updating the inverse in this manner is relatively simple when adding constraints, since the constraints matrix is upper triangular, and the only change is to add an additional column to the right and a row of zeros (except for the diagonal) to the bottom. Consequently, the only change to the inverse is to add a column to the right, and only that needs to be calculated. When dropping a constraint, the operation is more complex, as the inverse needs to be rebuilt from the point where a constraint was removed. F. Additional Components For completeness, we briefly describe some remaining components in Figure 4. 1) 1/x Blocks: Time consuming division, where necessary, is accommodated by precomputing the relevant denominators in a parallel circuit, and applying a multiplication at the appropriate point in time. The reciprocal calculators implement the bisection algorithm in a small, but slow, circuit. The slow speed is acceptable, since they operate in parallel to the rest of the circuit and complete before the results are required. 2) ULHandler: This is a dedicated circuit for rapidly checking the upper and lower bounds, to determine candidate constraints for inclusion in the next iteration of the algorithm. 3) Candidate Finder: This is another small, speed optimised, dedicated sub-circuit, to detect which existing constraints need to be dropped in the next iteration. G. FPGA Implementation The above design was coded using the VHDL hardware description language, and implemented using two different silicon chip technologies to obtain estimates of speed and size requirements. Firstly, the circuit was demonstrated using an Altera Stratix III EPSL150F115C2 FPGA, configured with a maximum prediction horizon of N = 12 and maximum state dimension of n = 24. This involved the use of a software tool to synthesize the VHDL, and then mapping it to the pre-defined logic elements embedded in the FPGA. Importantly, the design requires no specialised components or other functionality specific to this particular FPGA. Hence it is not restricted to using the device chosen for this demonstration. The design was synthesized in a number of alternative configurations to demonstrate the effects of varying precision.
7
The results are outlined in Table I, with all versions meeting the timing target of a 70MHz clock cycle. The results show that increasing or decreasing the precision in the QP solver has a significant effect on circuit size, which results from the scaling of the numerous computation elements. However, altering the precision of the state observer component has negligible impact on circuit complexity, since only one multiply-andaccumulate unit and one memory are affected. Note that on first reading, the number of bits used in the QP solver may seem surprisingly small, but, as will be illustrated in the next section, these choices can be consistent with good performance. Table I also compares the results between using generalised logic to form the memories, and using special purpose predefined memory structures within the FPGA. Usual practice is to implement memories using optimised predefined blocks where available, due to their more efficient implementation. However, we provide the generalised results to demonstrate the portability of the circuit and its non-reliance on these specialised blocks. QP Bits 5 7 7 7 5 7 7 7
Observer bits ALUTS Registers Using VHDL Registers as Memories 15 42526 34143 11 56734 37547 15 58440 39745 23 59923 41860 Using Altera Predefined Memories 15 25910 12039 11 31208 13150 15 32120 13532 23 33335 14292
Logic usage 48% 59% 62% 64% 26% 32% 33% 34%
TABLE I Synthesis results showing the number of combinatorial Adaptive Lookup Tables (ALUTs) and registers, and the overall usage of FPGA logic, for different values nm of mantissa bits. In all cases, nx = 7 bits of exponent are used.
H. ASIC Feasibility The second implementation target was an Application Specific Integrated Circuit (ASIC) layout, using 130nm, 8 metal layer, CMOS fabrication technology. Since ASIC’s contain no predefined blocks on the silicon, the design layout can be completely customized to optimize and trade off speed, power, and size. To form the circuit, a predefined library of standard cells was used, specifying the layout of circuit elements for the basic common logic functions. In addition, a specialised generator tool was used to produce optimised macro blocks for the memories. The same VHDL as for the FPGA implementation was used as input to an ASIC synthesis tool, which converted the VHDL into a netlist of connected standard cells and macro cells. This netlist formed the input to placement and router tools which, when provided with details of target size and positioning of macro blocks, produced the layout shown in Fig. 7. To obtain an indicative result, we considered a configuration employing a nm = 7-bit QP circuit and nm = 15-bit state observer with a target clock frequency of 90MHz. In this case, the layout software was able to fit the circuit into a core area of 2.5mm2 . Additional circuit elements, such as the power rings and connection pads surrounding the core, increase the total die area to approximately 5mm2 .
Fig. 7. Example chip layout in 130nm CMOS technology, with a core area of 2.5mm2 for the nm = 7-bit QP and nm = 15-bit state observer configuration.
While not a finalized layout for fabrication, it concretely indicates what is achievable using this technology. With further optimizations and refinement, we expect that this initial size and clock frequency estimate can be significantly improved. IV. E XPERIMENTAL R ESULTS The performance of the custom architecture MPC design just profiled is illustrated in this section via the control of a lightly damped resonant structure. A. Apparatus Description and Control Objective The experimental setup comprises a uniform aluminium beam, clamped at one end, and free at the other, as illustrated diagrammatically in Figure 8. It is a representation of many systems encountered in the field of active vibration control [8]. The beam is 970mm in length, 5mm in thickness, and 25mm in width. Control and disturbance forces may be applied via Physik Instrumente PIC151 piezoelectric ceramic transducers which are 70mm in length, 25mm in width, and 0.25mm in thickness. The transducer centers are mounted 105mm (control) and 195mm (disturbance) from the clamped base. They are activated by 200V PDL200 high voltage amplifiers. These induce lateral control ut and disturbance dt beam displacements that are proportional to the applied voltage. The resulting displacement yt that occurs 105mm from the base is proportional to the mechanical strain at that point, and this is measured by buffering and acquiring the induced opencircuit voltage of a further piezoelectric transducer mounted there, on the other side of the beam to the actuation transducers.
8
Estimated system: input 1 to output 1
The control objective is a disturbance rejection one. Namely, the control action ut is to be used to minimize displacement yt resulting from the disturbance dt . The supply rail limits of the voltage amplifiers imply hard constraints on the control authority ut , which should be respected in the control design.
30
20
10
The MPC strategy considered in this paper is dependent on a model for the system to be controlled. For the flexible beam apparatus just described, this model may be obtained by first principles physical laws [8]. The success of this approach depends on very careful and accurate physical measurement. Hence here we elect to obtain the model via system identification techniques. For this purpose, the frequency response between the actuations ut , dt and displacement yt was measured at 3201 non-equidistant points in the range 1–500Hz. These were used together with the subspace-based identification method developed in [18] to provide an initial n = 14’th order statespace system estimate of the form
Magnitude (dB)
0
B. Apparatus Model
−10
−20
−30
−40 System response Estimated SS model via EM −50
1
2
10
10 Frequency (Hz)
Fig. 9. Magnitude (dB) ut to yt frequency response of identified beam model (solid line) versus measured frequency response (dash dot line) Estimated system: input 2 to output 1
y
HV Amp
HV Amp
d
u
10
0
(24)
where the subscript “p” denotes “plant” and Ap ∈ R14×14 . This model is then used as an initialisation that is further refined to deliver a final maximum-likelihood estimate using the techniques developed in [34]. This dual stage approach was implemented using the freely available system identification toolbox [33], [31]. Note that according to Section III-D, the matrix Ap is required to be tri-diagonal, which was achieved by employing the methods in [19]. The frequency response of the resulting model from ut to yt is shown as the solid line in Figure 9, which can be compared to the measured frequency response shown as a dash dot line. The close agreement suggests accurate modeling. Similar comments apply to the modeling from disturbance dt to yt illustrated in Figure 10. This model is now augmented due to some practical considerations. The first is that the buffer amplifier connected to the piezoelectric sensor can induce a constant offset of the
Buffer
20
(22) (23)
−10
Magnitude (dB)
ξt+1 = Ap ξt + Bp ut + Bd dt , yt = Cp ξt + Dp ut + Dd dt + et 2 0 σd 0 dt ∼N , 0 et 0 σe2
−20
−30
−40
−50
−60
−70 System response Estimated SS model via EM −80
1
2
10
10 Frequency (Hz)
Fig. 10. Magnitude (dB) dt to yt frequency response of identified beam model (solid line) versus measured frequency response (dash dot line)
displacement measurement yt . Ignoring this issue will result in a MPC solution with artificially high DC gain. We address this by augmenting the estimated dynamics model (22),(23) so as to induce integral action in the MPC strategy. This is achieved by adding a new state ζt as follows ξt+1 Ap 0 ξt Bp = + ut ζt+1 0 1 ζt 0 Bd 0 dt + , (25) 0 1 µt ξt Cp 1 yt = + Dp ut + Dd dt + et (26) ζt where µt is the noise associated with the unknown DC component from the buffer, which is modeled as µt ∼ N (0, σµ2 ).
Fig. 8.
Plan view schematic of the experimental apparatus.
(27)
The second important modeling consideration for this application is that of high frequency modes not captured by the
9
description (25),(26). Any high frequency control action that excites such modes will have a devastating effect on control performance. This can be addressed by penalizing control action at any frequency above that of the highest modeled modes shown in Figures 9 and 10. This is achieved via a standard technique involving augmenting the model (25),(26) to deliver a new signal uhf t , which is a high-pass filtered version of ut . The purpose of this is that uhf t may then be included as part of the penalty term V for the MPC action (3). In this paper, a state space model ηt+1 = Af ηt + Bf ut
(28)
uhf t = Cf ηt + Df ut
(29)
was computed to correspond to an eighth order Butterworth high-pass filter with 3dB cut-off point at 500Hz. Again, Af is expressed in tri-diagonal form by employing the methods in [19]. Adding this to the model (22), (23) delivers a final augmented model xt+1 = Axt + But + wt , zt = Cxt + Dut + vt
(30) (31)
where ξt xt = ζt , ηt Bd 0 d wt = 0 1 t , µt 0 0 Ap 0 0 A = 0 1 0 , 0 0 Af Cp 1 0 , C= 0 0 Cf
In the above, the subscript “o” denotes observer. Considering the disturbance models (24),(27) the covariance matrices in (37) are given by Bd 0 T Bd 0 σ 0 0 1 , Qo = 0 1 d (38) 0 σµ 0 0 0 0 T Bd 0 σ 0 Dd 1 So = 0 1 d , (39) 0 0 0 0 0 0 T Dd 1 σd 0 Dd 1 Ro = . (40) 0 0 0 σe 0 0 The steady state Kalman gain matrix L is then given as −1 L = AΣC T + So CΣC T + Ro , (41) where Σ is computed as the solution to the discrete algebraic Riccati equation (DARE) (42) Σ = Qo + AΣAT − L CΣC T + Ro LT . D. Quadratic Program Formulation
zt =
yt , uhf t
Dd 1 dt vt = , 0 0 et Bp B = 0 , Bf Dp . D= Df
(32)
(33)
Central to the MPC strategy is the formulation of the cost function V employed in (4). In this application, it is chosen as N X T V (ut , xt+1 ) = zt+k Qzt+k + uTt+k|t Rut+k|t k=1
+ xTt+N +1 P xt+N +1 . (34) (35)
The above model has 23-states comprised of 14 for the beam dynamics, 8 for the high-pass filter and 1 for the DC component. Since Ap and Af are tridiagonal, then so is A, which satisfies the requirements of the Section III-D. Finally, it is important to address the fact that due to the finite 200V amplifier supply rails, this linear model is only valid for input amplitudes satisfying |ut | ≤ 0.5Volts. This is modeled via the constraint for Γut+1 ≤ b(xt+1 ) used in the MPC formulation (3) with the choices IN Γ= , b(·) = 0.5 12N , b0 = 0, Ψ = 0. (36) −IN In the above, IN denotes the N × N identity matrix and 12N is used to denote a 2N ×1 column vector with all entries equal to 1. C. Observer Design An essential use for the model just derived is the computation of an estimate x bt+1|t of the system state xt+1 via an observer of the form (8). This involves choosing the observer gain L in (8), and in this paper we use the steady state Kalman gain. This approach depends on the state and measurement noise in the model (30),(31) obeying the Gaussian distribution Qo So wt ∼N 0, . (37) vt SoT Ro
(43)
In this expression, the symmetric and positive definite matrices P, Q and R can be considered as controller tuning parameters that will be further commented on shortly. The penalty term involving zt+k is included in (43) for two reasons. To explain them, recall that zt+k is defined in (32) to be a two element vector composed of the beam displacement yt+k and a high pass filtered version uhf t+k|t of the control action. Therefore, the penalty on zt+k involves a penalty on the yt+k component, which is chosen to reflect that we are seeking to solve a disturbance rejection problem - namely that beam displacement yt be as little affected as possible by an external disturbance dt . Similarly, the penalty on the uhf t+k|t in zt+k is imposed to limit the bandwidth of the control action in the interests of not exciting unmodeled high frequency modes. The penalty term involving ut+k|t in (43) is included to allow tuning of the control input energy. Finally, the penalty term involving xt+N +1 is included here to ensure nominal stability by choosing the matrix P as the positive definite solution to the following DARE [17] P = C T QC + AT P A − K B T P B + R + DT QD K T , −1 K = − AT P B + C T QD B T P B + R + DT QD . While this type (43) of cost formulation is of a common sort in MPC applications, it involves future responses yt+k and future states xt+k , which are unknown at time t when the control ut+1|t is to be computed. Here we employ a standard approach to this problem by using predicted values ybt+k|t and x bt+k|t in place of yt+k and
10
xt+k . These may be computed using the observer mentioned in the previous section, and defined via (8)-(10). In turn, due to the linear structure of this observer, it may be expressed via the following linear algebra formulation zt+1 = Λb xt+1|t + Πut ,
(44)
where zbt+1|t zbt+2|t .. .
zt+1 , zb t+N |t x bt+N +1|t
C CA .. .
Λ, CAN −1 AN
(45)
and
D CB CAB .. .
D CB
Π, CAN −2 B AN −1 B
D ..
··· ···
··· ···
. CB AB
. D B
(46)
1.5 1
In this case, the cost function V defined in (43) with predicted values substituted as appropriate can be expressed in the required form (5),(6) as V (ut , x bt+1|t ) = uTt Hut + uTt f (b xt+1|t ) + c
Measured output (Volts)
Finally, until indicated otherwise, the disturbance dt was chosen as a periodic linear swept sine-wave signal (chirp) with a period of 4 seconds starting at 200Hz and finishing at 400Hz. The amplitude of dt was set at 1.6 Volts, which was chosen to ensure that the input ut would encounter the constraint limits ±0.5 Volts. Based on this configuration, the first experiment was designed to confirm that the custom architecture FPGA implementation does in fact achieve the desired disturbance rejection objective. Figure 11 shows a section of the open (grey outer envelope) and closed-loop (blue inner envelope) responses. The precision used in the FPGA implementation was chosen as nm = 7 mantissa bits for the QP solver and nm = 15 mantissa bits for the observer calculations. It can be seen from the top plot in Figure 11 that the beam vibrations due to the chirp disturbance have been significantly reduced. The bottom plot illustrates that the controller is hitting constraint limits in order to achieve the reduced vibrations.
−1
(47)
¯ Φ , ΠT QΛ (48)
¯ R ¯ are block diagonal matrices given by and Q, Q R R .. . ¯, ¯, , R Q .. . Q P
σe = 10−12 .
1
1.5
2
2.5
3
0
0.5
1
1.5 Time (seconds)
2
2.5
3
0.5
0
−0.5
−1
.
Fig. 11. Comparison of open-loop versus closed-loop control using the FPGA with 7-bit QP and 15-bit observer. Top: measured output response where the grey line indicates open loop response. Bottom: control action for closed-loop response with input limits shown as red dashed lines.
R
We now profile the performance of the custom architecture MPC controller described in sections II and III on the apparatus described in section IV-A using the model and control design just presented in sections IV-B to IV-D. The emphasis here is to establish the efficacy of the custom architecture implementation. It is not to argue the superiority or otherwise of MPC itself as a suitable strategy for vibration control purposes. This latter topic has been addressed in [30]. In the results to follow, the control sample interval was chosen as 200µs, which corresponds to a 5kHz sample rate, and this is approximately ten times the highest frequency mode we are seeking to control. The prediction horizon was selected as N = 12. The state and measurement noise covariances employed in the observer design are given by (37)-(40) with the choices σµ = 1,
0.5
E. Controller Performance
σd = 10−6 ,
0
1
Control action (Volts)
f (b xt+1|t ) , 2Φb xt+1|t ,
0 −0.5
−1.5
where c is a constant term, H and f are given by ¯ + R, ¯ H , ΠT QΠ
0.5
(49)
The Q and R matrices in the MPC cost (43) were chosen as 1 0 Q= , R = 10−2 . (50) 0 100
While the above experiment and choice of numerical precision confirms the FPGA implementation is sensible in this case, it does not indicate its level of performance relative to a known standard. In order to gauge this, we employed an independent hardware platform comprising an Analog Devices ADSP-21262 32-bit 200 MHz floating-point Digital Signal Processor (DSP). This DSP is programmed using the same active-set algorithm II.1 and observer structure (8) that are used in the FPGA implementation. The same prediction horizon of N = 12 is used, and the same tuning parameters (50) and (49) are used to formulate the QP problem and observer, respectively. Where the DSP and FPGA platforms differ is in their implementations of the observer and the active-set method. The DSP uses serial computations based on full IEEE-754 32-bit floating point operations, which offers greater precision than the custom format employed in the FPGA design. With this in mind, the DSP implementation is considered here to be the standard by which the performance of the FPGA circuit will be measured. A comparison of the DSP implementation, and the FPGA
11
FPGA (5−15) (Volts)
FPGA (7−11) (Volts)
FPGA (7−15) (Volts)
DSP IEEE754 (Volts)
Measured Output (y) vs time 0.5 0 −0.5
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
0.5 0 −0.5
0.5
−0.5
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5 Time (seconds)
2
2.5
3
0.5 0 −0.5
FPGA (7−15) (Volts)
DSP IEEE754 (Volts)
Control Signal (u) vs time
FPGA (7−11) (Volts)
Precision IEEE-754 FPGA-7-15 FPGA-7-11 FPGA-5-15
0
Fig. 12. Comparison of the output response as determined by the DSP and FPGA implementations (of various bit widths, indicated by QP-Observer mantissa combinations).
FPGA (5−15) (Volts)
observer nm = 15 is stable with performance shown in the bottom plot of Figure 12. The control action ut corresponding to these scenarios is shown in Figure 13. Recognising the limits of the visual evidence supplied in the plots, Table II presents the Root-Mean-Squared output and input, together with the peak output for the various controllers being profiled here. These values were based on the same data used in Figures 12–13.
1 0 −1
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5 Time (seconds)
2
2.5
3
1 0 −1
1 0 −1
1 0 −1
RMS output 10.00 11.62 11.14 10.83
RMS input 8.22 7.31 7.03 6.62
Peak output 0.11 0.11 0.10 0.09
TABLE II Root-Mean-Square values of the output yt , input ut , and peak output yt values for different controller precisions.
These experiments establish the efficacy of FPGA implementation, as it achieves the control objective and performs well relative to the full IEEE-754 32-bit DSP implementation. However, these experiments do not provide information about the FPGA efficiency, i.e. how fast it solves the MPC problem. When considering this, it is important to remember that the observer calculations are performed in parallel to the QP solver, and by comparison they are negligible. Therefore, Figure 14 presents the QP solution times for approximately 35000 different QP problems. In this experiment, the disturbance was changed to a large manually applied displacement of the beam tip in order to excite the low frequency modes, thus ensuring that constraints are encountered for the whole prediction horizon. This was applied many times and the worst case (i.e. largest solve times) results are presented in Figure 14. Clearly the FPGA implementation is quite efficient and the QP solution required no more than 30µs in order to terminate.
Fig. 13. Comparison of control actions generated by the DSP and FPGA implementations (of various bit widths, indicated by QP-Observer mantissa combinations).
Solve times for each QP 220 200µ second limit
200 180 160 140 Microseconds
implementations for various choices of mantissa width is provided in Figures 12 and 13. Concentrating first on Figure 12, the top plot illustrates the disturbance rejection performance of the DSP IEEE-754 32-bit floating point implementation. The plot below that shows the performance of the custom architecture FPGA performance with nm = 7 mantissa bits for the QP solver, and hence 15 bits overall when including nx = 7 exponent bits and a sign bit. Interestingly, while using only half the number of bits, the FPGA performance is very close to the DSP performance. As explained before, the observer requires more precision than the QP solver, and it is run with nm = 15 mantissa bits in the second from top plot just discussed. The performance obtained with observer mantissa bits decreased to nm = 11 is shown in the third from top plot, and again is little different to the DSP performance. If the QP solver mantissa width is decreased to nm = 5 with this nm = 11 observer precision, the control becomes unstable. However, the combination of QP nm = 5 with
120 100 80 60 40 20 0
0.5
1
1.5 2 Sample Number
2.5
3
3.5 4
x 10
Fig. 14. Solve times for the QP. Note that the maximum allowed (200µ seconds) is indicated via the red dashed line.
The above results indicate that the FPGA implementation presented in this paper may be applicable to even more demanding control applications requiring faster sample rates.
12
Alternatively, a smaller and slower version of the circuit could be employed to more fully utilize the available time. F. Tradeoffs As one moves through the DSP, FPGA and ASIC implementation options, there are therefore associated benefits in terms of decreasing circuit size, power consumption and potentially higher speed control. Associated with this are processing unit costs, which increase by an order of magnitude at each progression through the DSP, FPGA and ASIC options in one-off volumes. However, in high volume, the per-unit ASIC cost can decrease dramatically. Furthermore, the FPGA cost can be negligible if it is realised via spare capacity on an existing chip. The choice of platform will therefore depend on a range of considerations. V. C ONCLUSION This paper has investigated the utility of custom architecture computing platforms for MPC implementation. This has led to a demonstration of control using a 23 state model and length 12 prediction horizon, with 200µs sampling period, and only 30µs of this available time being necessary for computing the control action. While these results were obtained using a mid-range FPGA to implement the custom architecture, a preliminary feasibility study indicates that an ASIC implementation could be achieved using only 2.5mm2 . This indicates the potential for the benefits of MPC to be more widely available. Acknowledgement The authors would like to sincerely thank Dr. Andrew Fleming for his generous assistance in the building and commissioning of the experimental apparatus used in this paper. R EFERENCES [1] Implementation of optimization-based control on a chip. Regular Session at the American Control Conference, June 14-16 2006. Minneapolis, Minnesota USA. [2] Model predictive control for fast nonlinear systems: existing approaches, challenges, and applications. 1-day pre-congress workshop at the 45th IEEE Conference on Decision and Control, December 12, 2006. San Diego, CA, USA. [3] Invited session: Hardware implementation of model predictive control. European Control Conference, Budapest, Hungary, August 2009. [4] M.J. Beauchamp, S. Hauck, K.D. Underwood, and K.S. Hemmert. Architectural modifications to enhance the floating-point performance of fpgas. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 16(2):177–187, 2008. [5] G.A. Constantinides. Tutorial paper: Parallel architectures for model predictive control. In Proceedings of the European Control Conference, Budapest, pages 138–143, 2009. [6] L.M. Davis. Scaled and decoupled Cholesky and QR decompositions with application to spherical mimo detection. IEEE Wireless Communications and Networking,, 1:326–331, March 2003. [7] K. Dickson, Z. Liu, and J.V. McCanny. Programmable processor design for givens rotations based applications. Proc. 4th IEEE Workshop on Sensor Array and Multichannel Processing, pages 84–87, 2006. [8] C.R. Fuller, S.J. Elliott, and P.A. Nelson. Active Control of Vibration. Academic Press, 1996. [9] D. Goldfarb and A. Idnani. A numerically stable dual method for solving strictly convex quadratic programs. Mathematical Programming, 27:1– 33, 1983. [10] G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989. [11] J. Gotze. Parallel methods for iterative matrix decompositions. Proc. IEEE International Symposium on Circuits and Systems, 1:232–235, June 1991.
[12] S.F. Hsiao and J.M. Delosme. Householder CORDIC Algorithms. IEEE Transactions on Computers, 44(8):990–1001, August 1995. [13] T.A. Johansen, W. Jackson, R. Schreiber, and P. Tondel. Hardware synthesis of explicit model predictive controllers. IEEE Transactions on Control System Technology, 15(1):191–197, January 2007. [14] M.S.K. Lau, S.P. Yue, K.V. Ling, and J.M. Maciejowski. A comparison of interior point and active set methods for fpga implementation of model predictive control. In Proc. European Control Conference, Budapest, August 2009. European Union Control Association. [15] K.V. Ling, S.P. Yue, and J.M. Maciejowski. A FPGA Implementation of Model Predictive Control. In Proceedings of the 2006 American Control Conference, pages 1930–1935, Minneapolis, Minnesota, USA, June 14-16 2006. [16] J.M. Maciejowski. Predictive Control with Constraints. Pearson Education Limited, Harlow, Essex, 2002. [17] D.Q. Mayne, J.B. Rawlings, C.V. Rao, and P.O.M. Scokaert. Constrained model predictive control:Stability and optimality. Automatica, 36(6):789–814, 2000. [18] T. McKelvey, H. Akc¸ay, and L. Ljung. Subspace-based multivariable system identification from frequency response data. IEEE Transactions on Automatic Control, 41:960–979, 1996. [19] T. McKelvey and A. Helmersson. State-space parametrizations of multivariable linear systems using tridiagonal matrix forms. In Proceedings of the 35th IEEE Conference on Decision and Control, volume 4, pages 3654–3659, December 1996. [20] M. Morari and J.H. Lee. Model predictive control:Past, present and future. Computers and Chemical Engineering, 23(4-5):667–682, 1999. [21] M.J.D. Powell. On the quadratic programming algorithm of goldfarb and idnani. Mathematical Programming Study, 25:46–61, 1985. [22] S. Joe Qin and T.A. Badgwell. A survey of industrial model predictive control technology. Control Engineering Practice, 11:733–764, 2003. [23] K. Schittkowski. QL: A Fortran code for convex quadratic programming - Users’s Guide. Report, Department of Mathematics, University of Bayreuth, 2003. [24] P. Tøndel, T.A. Johansen, and A. Bemporad. An algorithm for multiparametric quadratic programming and explicit MPC solutions. Automatica, 39(3):489–497, March 2003. [25] J.L. Tripp, M.B. Gokhale, and K.D. Peterson. Trident: From high-level language to hardware circuitry. Computer, 40(3):28–37, 2007. [26] K. Underwood. FPGAs vs. CPUs: trends in peak floating-point performance. In FPGA ’04: Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays, pages 171–180, New York, NY, USA, 2004. ACM. [27] J.E. Volder. The Birth of CORDIC. Journal of VLSI Signal Processing, 25(3):101–105, June 2000. [28] P.D. Vouzis, M.V. Kothare, L.G. Bleris, and M.G. Arnold. A system-ona-chip implementation for embedded real-time model predictive control. IEEE Transactions on Control Systems Technology, 17(5):1006 –1017, sept. 2009. [29] Y. Wang and S. Boyd. Fast model predictive control using online optimization. IEEE Transactions on Control Systems Technology, 18(2):267–278, March 2010. [30] A.G. Wills, D. Bates, A. Fleming, B. Ninness, and S.O.R. Moheimani. Model predictive control applied to constraint handling in active noise and vibration control. IEEE Transactions on Control Systems Technology, 16(1):3–12, 2008. [31] A.G. Wills, A. Mills, and B. Ninness. A matlab software environment for system identification. In Proceedings of the 15th IFAC Symposium on System Identification, St. Malo, France, 2009. [32] A.G. Wills and B. Ninness. QPC - Quadratic Programming in C. Webpage. http://sigpromu.org/quadprog/. [33] A.G. Wills and B. Ninness. UNIT - University of Newcastle Identification Toolbox. Webpage. http://sigpromu.org/idtoolbox/. [34] A.G. Wills, B. Ninness, and S.H. Gibson. Maximum likelihood estimation of state space models from frequency domain data. IEEE Transactions on Automatic Control, 54(1):19–33, 2009.