and clocking (Cady 1997; Predko 1999). The primary role of ...... N. S. (1995). Control Systems Engineering, The Benjamin/Cummings Publishing Company, Inc.
Design and Implementation of Model Predictive Control Algorithms for Small Satellite Three-Axis Stabilization By Xi Chen
A thesis submitted in fulfilment of the requirements for the degree of Master of Philosophy
School of Aerospace, Mechanical and Mechatronic Engineering The University of Sydney
March 2011
Abstract There is intense interest in applying complex controllers to ‘fast’ systems, including aerospace, automotive, large electric motors, and paper-making machines. The bottleneck problem which impedes application in such areas is often the computational speed of the on-line computation demand. There is therefore growing research activity into implementation of feedback controllers on specialised hardware, such as FPGAs. Much of the interest comes from the demand for Model Predictive Control (MPC). This thesis presents a FPGA based processor dedicated for the Model predictive control (MPC) algorithm. MPCDP (Model Predictive Control Dedicated Processor) is the intended application of MPC algorithm on FPGA based embedded processor. It is used as an on-board attitude controller for a small satellite. This research demonstrates the possibility of applying MPC algorithm on fast-response and resource-constrained systems. The main obstacle of applying MPC algorithm on the satellite is the power source of that satellite is quite limited because of its small volume. The high computation demand of MPC will consume a large amount of energy. To solve this problem, this thesis present a way to simplify the MPC algorithm by reducing the number of coefficients involved in the on-line computation, thereby reducing the power consumption. A hardware implementation of MPC algorithm is also realized. The MPCDP is designed and optimized according to the simplified MPC algorithm. The source codes of MPCDP` are written in Verilog HDL and the primary implementation I
technology is in a field programmable gate array. A C-like high level programming language for this processor is designed as well as its compiler and assembler. Many Matlab and Simulink simulation results are given in this thesis to verify the validity of the simplified MPC algorithm. A Simulink and Modelsim co-simulation is done to test the performance of MPCDP on a real physical system. The Verilog codes are synthesized by using ISE synthesis tool and comparisons are made between MPCDP and other embedded processors, in terms of power consumption, CPU frequency and so forth.
II
Acknowledgements I would like to express my gratitude to my supervisor, Dr. Xiaofeng Wu, for his support and guidance during my research. I sincerely thank all undergraduate and postgraduate students in our research group for their kindly help and advice when I carried out this research. These people include Xueliang Bai, Rui Pang, Frank Fangxin Zou and Hamed. My thanks also go to my family for their support and understanding. Finally, I would like to acknowledge all academic staff in Dalian University of Technology who taught in my bachelor degree.
III
Statement of Originality:
The material presented in this report contains all of my own work. To the best of my knowledge and belief, this thesis contains no material previously published or written by another person except where due acknowledgment is made in the report itself.
Name: Xi Chen
Signature:
Data:
IV
Contents Abstract .......................................................................................................................... I Acknowledgements ...................................................................................................... III Statement of Originality...............................................................................................IV Chapter 1. 1.1.
Introduction .............................................................................................. 1
Background ..................................................................................................... 1
1.1.1.
Research motivation: Satellite attitude control ........................................ 1
1.1.2.
Model Predictive Control (MPC)............................................................. 2
1.1.3.
Control system design .............................................................................. 3
1.1.4.
MPC hardware implementation ............................................................... 4
1.2.
Research objectives and contributions ............................................................ 5
1.3.
Dissertation overview ...................................................................................... 6
Chapter 2. 2.1.
Satellite dynamics and MPC algorithm ................................................... 8
Satellite dynamics ........................................................................................... 8
2.1.1.
Reference frames ..................................................................................... 8
2.1.2.
Spacecraft nonlinear attitude model......................................................... 9
2.1.3.
Linearized attitude dynamics model ...................................................... 10
2.1.4.
Magnetic attitude control ....................................................................... 12
2.1.5.
Constraints in magnetic attitude control ................................................ 13
V
2.2.
MPC .............................................................................................................. 14
2.2.1.
Introduction to MPC .............................................................................. 14
2.2.2.
Traditional MPC .................................................................................... 18
2.2.3.
Laguerre MPC ........................................................................................ 20
2.3.
Simulations .................................................................................................... 24
2.4.
Summary ....................................................................................................... 29
Chapter 3. 3.1.
MPCDP .................................................................................................. 30
Digital Controllers ......................................................................................... 30
3.1.1.
General purpose processor ..................................................................... 31
3.1.2.
Microcontroller ...................................................................................... 32
3.1.3.
Digital signal processor.......................................................................... 33
3.1.4.
Special-purpose processor ..................................................................... 33
3.1.5.
Soft processors ....................................................................................... 34
3.1.6.
Field-programmable gate array (FPGA) ................................................ 35
3.1.7.
Other architectures ................................................................................. 37
3.2.
Numerical issues ........................................................................................... 37
3.3.
Instruction set design ..................................................................................... 40
3.3.1.
Defining instruction set .......................................................................... 40
3.3.2 Specifying instruction set ............................................................................. 50 3.4.
Microarchitectures ......................................................................................... 56
3.5.
Pipelining ...................................................................................................... 62 VI
3.5.1.
Overview of the pipeline structure ......................................................... 62
3.5.2.
A detailed description of the pipeline stages ......................................... 66
3.6.
MPCDP compiler and assembler .................................................................. 71
3.6.1 MPC assembler ............................................................................................ 71 3.6.2 Compiler ....................................................................................................... 72 3.7.
Software architecture for control program .................................................... 75
3.7.1.
Introduction to control loops in MPCDP ............................................... 75
3.7.2.
Simulation examples .............................................................................. 77
3.8.
Summary ....................................................................................................... 79
Chapter 4.
Simulation .............................................................................................. 81
4.1.
Hardware-in-the-loop simulation .................................................................. 81
4.2.
EDA Simulator Link tool .............................................................................. 84
4.3.
Satellite attitude control simulations ............................................................. 86
4.4.
Synthesis reports ........................................................................................... 93
4.5.
Summary ....................................................................................................... 96
Chapter 5. 5.1.
Conclusions and Future Work ............................................................... 98
Conclusions ................................................................................................... 98
5.1.1.
Review of the thesis ............................................................................... 98
5.1.2.
Achievements ......................................................................................... 98
5.1.3.
Limitations ............................................................................................. 99
5.2.
Future work ................................................................................................. 100 VII
APPENDIX A
The MPCDP instruction set architecture (ISA)......... 102
APPENDIX B
Verilog HDL Code for Multiplier ............................ 105
APPENDIX C
Abbreviations ........................................................... 108
APPENDIX D
Publications .............................................................. 110
Bibliography .............................................................................................................. 111
VIII
List of Tables Table 2.1: Satellite parameters ..................................................................................... 27 Table 3.1: The conditional branch instructions available in MPCDP, and the conditions they test. ..................................................................................................... 47 Table 3.2: The operations and corresponding opcodes for the MPCDP processor ..... 50 Table 3.3: I/O address .................................................................................................. 58 Table 3.4: keywords supported by MPCDP-C ............................................................ 74 Table 3.5: The operators supported by MPCDP-C ...................................................... 74 Table 4.1: MPCDP-1 ports description ........................................................................ 87 Table 4.2: One-channel satellite simulation parameters .............................................. 87 Table 4.3: Number conversion in Example A. ............................................................ 88 Table 4.4: Three-channel satellite simulation parameters ........................................... 91
IX
List of Figures Figure 2.1: Block diagram of SISO MPC .................................................................... 15 Figure 2.2: Controller state at the kth sampling instant ............................................... 16 Figure 2.3: Example A. Closed-loop response with forward operator (constrained case): Nc =100. Top plot: system output; bottom plot: control signal. ........................ 25 Figure 2.4: Example A. Closed-loop response with forward operator (constrained case): Nc=40. Top plot: system output; bottom plot: control signal. ........................... 25 Figure 2.5: Example A. Closed-loop response with Laguerre functions (constrained case): a=0.9, N=10. Top plot: system output; bottom plot: control signal. ................. 26 Figure 2.6: Example B: satellite attitude angles .......................................................... 27 Figure 2.7: Example B: satellite angular rates ............................................................. 28 Figure 2.8: Example B: control torques ....................................................................... 28 Figure 3.1: Numerical specification for state variables ............................................... 40 Figure 3.2: Numerical specification for coefficients ................................................... 40 Figure 3.3: The instruction format for the arithmetic and logical operations. ............. 52 Figure 3.4: The instruction format for the conditional branch instruction. ................. 53 Figure 3.5: The instruction format for the two jump instructions................................ 53 Figure 3.6: The instruction format for the the LDI and LD instructions. .................... 54 Figure 3.7: The instruction format for the LDX instruction. ....................................... 55 Figure 3.8: The instruction format for the ST instruction. ........................................... 55 Figure 3.9: The instruction format for the STX instruction. ........................................ 55 Figure 3.10: MPCDP memory structure ...................................................................... 57 Figure 3.11: Memory and I/O architecture .................................................................. 58 Figure 3.12: Condition testing block: CHECKCC ...................................................... 60 X
Figure 3.13: Register file ............................................................................................. 60 Figure 3.14: Sign extension block ............................................................................... 61 Figure 3.15: An overview of the structure of the pipeline, showing the stages, registers, and forwarded signals. .................................................................................. 64 Figure 3.16: The IF stage of the instruction pipeline. .................................................. 67 Figure 3.17: The ID stage of the instruction pipeline. ................................................. 68 Figure 3.18: The EX stage of the instruction pipeline. ................................................ 69 Figure 3.19: The MEM and WB stages of the instruction pipeline. ............................ 71 Figure 3.20: assembly code example ........................................................................... 72 Figure 3.21: Control program fowchart ....................................................................... 76 Figure 3.22: MPCDP-C code for initialisation stage ................................................... 78 Figure 3.23: MPCDP-C code for I/O operation ........................................................... 78 Figure 3.24: MPCDP-C code for execution stage and I/O output ............................... 79 Figure 4.1: MPCDP verification flowchart .................................................................. 82 Figure 4.2: Hardware-in-the-loop simulation scheme ................................................. 83 Figure 4.3: Linking with Simulink and the HDL Simulator ........................................ 84 Figure 4.4: Co-simulation block diagram .................................................................... 85 Figure 4.5: MPCDP-1connection diagram .................................................................. 86 Figure 4.6: Example A: one-channel satellite attitude control (angle). Top plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton. ................... 88 Figure 4.7: Example A: one-channel satellite attitude control (angular rate). Top plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton. ................... 89 Figure 4.8: Example A: one-channel satellite attitude control (Torque). Top plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton. ................... 89 Figure 4.9: RTL simulation result for one-channel satellite ........................................ 90 XI
Figure 4.10: The simulation block diagram in Simulink (MPCDP-2) ........................ 92 Figure 4.11: Example B: three-channel satellite attitude control (angle). Left plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton. ................... 92 Figure 4.12: Example B: three-channel satellite attitude control (angular rate). Left plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton. ......... 93 Figure 4.13 Synthesis report for MPCDP-1 ................................................................. 94 Figure 4.14: Power Requirements Analysis for the MPCDP-1 ................................... 94 Figure 4.15: Synthesis report for CSP based on 4-clock cycle state machine ............. 95 Figure 4.16Power Requirements Analysis for the CSP ............................................... 96
XII
Chapter 1. Introduction In this chapter we present the background and the motivation for the research addressed in this thesis. In this research, we want to apply MPC algorithm in real-time systems. Our application background is the satellite attitude control. But it should be noted that the MPC dedicated processor can be also used in other real-time systems such as motor speed and position control. In section 1.1 we first introduce the importance of satellite attitude control, various kinds of actuators, the Model Predictive Control (MPC) algorithm, control system design basics and the some previous researches in applying MPC in real-time systems. In section 1.2 we discuss the objectives and contributions of this thesis. Section 1.3 we conclude this chapter with an overview of this thesis.
1.1. Background 1.1.1. Research motivation: Satellite attitude control Attitude control subsystem is one the most important subsystems in satellites. Although the need for attitude control varies for different missions, it is present in almost every satellite in orbit. For many reasons, the attitude of satellites directly determines the outcome of missions. For example, satellites must be oriented correctly so that the antenna is pointing towards the right direction for communication. Also, in order to produce the maximum amount of energy, it is desired to face the solar panels directly towards the 1
1.1 Background Sun. Furthermore, satellites that are performing remote sensing missions depend even more on attitude control because the camera needs to be pointing accurately towards the target and the satellite body has to be stabilised for image taking (Hannay 2009). Satellite attitudes can be controlled by various actuation methods, including thrusters, reaction wheels, magnetic torque rods, or a combination of above. Electromagnetic actuator is a particularly effective and reliable one. It interacts with the earth’s own magnetic field in order to generate a control torque acting on spacecraft. An advantage of magnetic torquers is that they require no fuel and so have virtually unlimited life. They do of course require electrical power, but there is no exhaust pollutant and by providing a couple they are not sensitive to movement of the centre of mass. One drawback of this control technique is that the torques which can be applied to the spacecraft for attitude control purposes are constrained to lie in the plane orthogonal to earth magnetic field, and hence the satellite is under-actuated. In the equatorial plane, the magnetic field line always lie horizontally, north-south. Consequently a spacecraft whose orbit lies in this plane cannot use magnetic torquers to counteract the north-south component of their disturbance torque, or to dump this component of momentum. For an inclined orbit, suitable variation of the magnetic field allows controllability in the long term, but presents a significant challenge from a control perspective (Fortescue, Stark et al.). A novel approach to the magnetic attitude control problem, based on Model predictive control has been demonstrated as a suitable candidate in many literatures. (Silani and Lovera 2005) presents a thorough review of the area of magnetic control, and develops a closed form solution for MPC based controller under the constraints of earth’s magnetic field. The author also points out that the optimal solution cannot be given in closed form if the constraints on the coils’ magnetic dipoles become active. 1
1.1 Background The systemic method to deal with these equality and inequality constraints in MPC is to operate online quadratic programming (Wang 2009). (Wood, Chen et al. 2006) investigates MPC for attitude control using magnetic torque rod. The design is carried out based on the closed form solution for MPC based attitude controller when considering the earth’s magnetic field. (Wood and Wen-Hua 2008) further develops a MPC based magnetic controller to deal with the problem of Low Earth orbiting satellites attitude control under significant disturbances from the external environment including aerodynamic forces, gravity gradient torques and residual magnetic dipoles. A comparison between PD control and MPC control for a three-axis satellite attitude control shows that using PD control along is very difficult to achieve good performance on all three axes while MPC approaches lead to significantly better performance. 1.1.2. Model Predictive Control (MPC) Model Predictive Control (MPC) refers to a class of algorithms that compute the trajectory of control signals to optimize the future behaviour of a plant. MPC is currently widely used in process industries such as petrochemical industry. It is also suitable for a larger range of applications such as aerospace and road vehicles. The main feature of MPC is its capability to deal with constraints such as the actuator saturation and output limits. The advantages of MPC such as the ability to handle constraints, the applicability to nonlinear processes and to multivariable problems, make this control method a necessary choice for many control problems. For example, there are numerous potential applications for biomedical control including: control of physiological processes, glucose control, drug infusion control, cardiac pacemakers and defibrillators, blood flow and pressure control, and neurological implants. On the 2
1.1 Background other hand, several small-scale industrial application areas that have fast dynamics will benefit from a real-time implementation of MPC. These areas include MEMS, automotive/aerospace control, power electronics, and microchemical systems(Wang 2004). 1.1.3. Control system design The design process of a control system is described by (Nise 1995). At the beginning, the plant must be modelled in mathematics. The control system therefore can be designed and analyzed according to the design specifications such as desired transient response and steady-state accuracy. Many control design methods are available today. These methods are divided into two branches (Goodall 2002): classical and modern control. Classical control applies time or frequency domain techniques to analyse the plant and design the compensator. This approach is based on converting a system's differential equation to a transfer function, thus generating a mathematical model that algebraically relates a representation of the output to a representation of the input. An advantage of these techniques is that they rapidly provide stability and transient response information. The primary disadvantage of the classical approach is its limited applicability, being practicable only for linear, time-invariant systems of relatively low complexity and usually having a single input and output. The languages for classical control are Laplace transform or z-transform, which describe the relationship between the input and output of a control system. Modern control, however, benefits from the advances in computer technology (Brogan 1985). First, the physical system can be modelled into a more complex one, which means a large number of variables, nonlinearities and time-varying parameters 3
1.1 Background must be included in the model. Second, the computer technology is well suited for the need of greater accuracy and efficiency, which has changed the emphasis on control system performance. Third, computers are now so commonly used as just another component in the control system, which means that the discrete-time and digital system control now deserves much more attention than it did in the past. In addition, the foundation of modern control theory is the state space models, being ideal for calculations in digital control. 1.1.4. MPC hardware implementation A growing number of researching groups is working on applying MPC to dynamical systems with fast response times and which are traditionally considered unsuitable for MPC implementation. A real-time implementation using an off-the-shelf processor is demonstrated in(Bleris and Kothare 2005), where the Motorola 32-bit MPC 555 core containing a 64-bit Floating Point unit is utilized. An alternative approach for determining the optimal control moves for a system is proposed in (Bemporad, Morari et al. 2002), where the control law is piece-wise affine and continuous. The optimization problem is solved offline, and during run time the solutions are invoked from the memory. Although, this technique is proven to be very efficient in terms of performance for small scale systems, the memory growth is superexponential with respect to the controlled variables, thus rendering the system prohibitive for embedded applications, where the data transfers to and from the memory is the dominant factor of power consumption. In (Minghua and Keck-Voon 2005), an FPGA implementation of MPC using a Quadratic Programming (QP) optimization algorithm is presented. For fast prototyping, Handel-C is used to describe the optimization algorithm in order to convert it to a hardware description format, which is simulated 4
1.2 Research objectives and contributions in conjunction with Matlab, before being downloaded onto an FPGA. Alternatively, in (Vouzis, Bleris et al. 2006), a custom hardware architecture consisting of a generalpurpose microprocessor and an auxiliary unit was proposed. The general-purpose microprocessor acts as the master in the system and the auxiliary unit acts as a matrix coprocessor by carrying out matrix operations. The matrix processor uses a 16-bit Logarithmic Number System (LNS) arithmetic unit to carry out the required arithmetic operations. By using LNS, the multiplication and division are simplified considerably, compared to a fixed-point representation, since they are converted to addition and subtract. The operations of logarithmic addition and subtraction are more expensive and account for most of the delay and area cost. However, as it requires two processors to fulfil the control task, the hardware implementation is relatively complex and power consumption is higher.
1.2. Research objectives and contributions The main Objective of this research is to design a MPC controller to control the satellite attitude. Since the realization of MPC algorithm requires a mathematic model of the real physical system, a linearized state space model of a satellite and the magnetic field model of the earth are developed. As it has been introduced before, the main barrier of applying MPC algorithm on a small satellite is the power limitation. To solve this problem, the current MPC algorithm is simplified. To verify the validity of the simplified control algorithm, a series of simulations are conducted on the satellite model in Matlab and Simulink environment. A five-stage pipeline processor dedicated for MPC algorithm is designed. Both the structure and the instruction set of MPCDP are optimized according to the needs of MPC algorithm. A C-like programming language as well as its compiler and assembler are developed. To test 5
1.3 Dissertation overview the performance of MPCDP on the satellite model, a Simulink and Modelsim cosimulation is performed. This simulation is conducted by using Simulink EDA simulator link toolbox and Modelsim. By using this simulation methodology, the testing period of an embedded processor can be greatly shortened. The main contribution of this thesis is the MPC algorithm is successfully applied on a resourceconstrained real-time system. Besides, the simulation using Simulink EDA simulator link toolbox is a novel method in embedded processor verification, which can be used to replace the traditional HIL simulation to shorten the product testing period.
1.3. Dissertation overview This thesis presents the research work which applies Laguerre MPC algorithm in realtime control. This work results in simplified MPC algorithm, an MPC dedicated fivestage processor and a novel simulation methodology for validating customized processors. The remainder is organised as follows. Chapter 2 presents the satellite dynamics and MPC algorithm. One novel MPC algorithm based on Laguerre functions are introduced. Matlab simulation results are given to test the performance of the new algorithm. Chapter 3 focuses on the processor solution for MPC algorithm applications in VLSI. The numerical representations, instruction set, essential components and the pipeline structure for the processor are explained. Besides, the software scheme of programming MPCDP is presented. The C-like programming language and compiler for MPCDP are also explained. Chapter 4 verifies the processor architecture and Laguerre MPC algorithm by using Simulink and Modelsim co-simulation. Two satellite attitude control examples are given. 6
1.3 Dissertation overview Chapter 5 presents the synthesis results of MPCDP. These results are compared with a non-pipelined processor, in terms of speed and power consumption. Chapter 6 concludes and some future work is presented.
7
Chapter 2. Satellite dynamics and MPC algorithm In this chapter we consider the MPC algorithm for satellite attitude control. The performance of MPC controller depends on the accuracy of dynamic model. So in section 2.1 a linearized state space dynamic model of a small satellite is developed. Besides, since magnetic rods have been chosen as the attitude actuators, the earth’s magnetic field model and the actuators’ constraints are analysed in the section. In section 2.2 Laguerre MPC is introduced and we focus on its feature to reduce the computational demand. Section 2.3 gives some simulation results of Laguerre MPC on a small satellite and a motor.
2.1. Satellite dynamics 2.1.1. Reference frames
Earth centred inertial (ECI) reference frame: The origin of these axes is in the Earth’s centre. The X axis is parallel to the line of nodes, which is the intersection between the Earth’s equatorial plane and the plane of the ecliptic, and is positive in the Vernal equinox direction (Aries point). The Z axis is defined as being parallel to the Earth’s Geographic north-south axis and pointing north. The Y axis completes the right-handed orthogonal triad.
Orbit reference frame: The origin of this orbit reference frame moves with the cm (centre of mass) of the satellite in the orbit. The Z axis points toward the cm of the earth. The X axis is in the plane of the orbit, perpendicular to the Z 8
2.1 Satellite dynamics axis, in the direction of the velocity of the satellite. The Y axis is normal to the local plane of the orbit, and completes a three-axis right-hand orthogonal system.
Satellite body reference frame: The origin of this reference frame is in the satellite centre of mass; the axes are assumed to coincide with the body’s principal inertial axes (Psiaki 2000 ).
2.1.2. Spacecraft nonlinear attitude model The nonlinear attitude model of satellite includes kinematic and dynamic equations of motion. The attitude dynamic equations are obtained from Euler’s moment equation as follows (Sidi 1997 ). (2.1) where
is angular velocity vector of satellite body reference frame with
respect to inertial reference frame, expressed in satellite body reference frame. h is angular momentum vector of the entire system, expressed in satellite body system. denotes differentiation of h in the body frame and T is the sum of external torques. For a rigid satellite, h=I , where I is the satellite inertia matrix expressed in the body frame, so the attitude dynamic equations reduce to well-known form: (2.2) where (2.3) is the orbital angular rate, A(q) is the attitude matrix with respect to the orbital reference frame and
is angular velocity vector of orbit reference frame 9
2.1 Satellite dynamics with respect to inertial reference frame, expressed in satellite body reference frame. For earth pointing satellite, we always consider the orbit reference frame as an attitude reference, so that
(2.4)
Where
, is angular velocity vector of satellite body reference frame with
respect to orbit reference frame, expressed in satellite body reference frame. When the satellite has the desired attitude, the satellite body reference frame must remain aligned with orbit reference frame. The satellite attitude kinematic equations are as follows (Sidi 1997 ).
(2.5)
The quaternion
represents the rotation from the orbit
reference frame to the satellite body reference frame. 2.1.3. Linearized attitude dynamics model The nonlinear model is too complicated for use in control design or in closed-loop stability analysis. Besides quaternion method, attitude transformation matrix can be expressed with Euler angle ( ,θ,ψ). The roll angle ( ) is defined as a rotation about the x body axis, the pitch angle (θ) about the y body axis, and the yaw angle (ψ) 10
2.1 Satellite dynamics about the z body axis. Assuming small variation of the Euler angles, the attitude transformation matrix becomes:
(2.6)
and one also obtains that: , So (2.4) becomes (2.7) With these assumptions in (2.6), (2.7) becomes
(2.8) Finally, the satellite attitude dynamics are approximated well by a linear model (Psiaki 2000 ).
(2.9) Where,
11
2.1 Satellite dynamics are the attitude angle of satellite about roll, pitch and yaw axes respectively, is disturbance torque which is considered as zero in this paper, and
,
,
is control torque
.
2.1.4. Magnetic attitude control The controlling torque acting on spacecraft is produced by the interaction between a magnetic moment generated within a spacecraft and the earth’s magnetic field:
(2.10)
Where m is the generated magnetic moment inside the body and b is the earth’s magnetic field intensity. (2.10) can be rewritten as,
(2.11)
A dipole approximation of the Earth’s magnetic field is given in the following equation expressed in orbit reference frame (Sidi 1997 ).
(2.12)
Where
is the inclination of the satellite’s orbit with respect to the magnetic equator
and a is the orbit’s semi-major axis. Time is measured from t=0 at the ascending-node 12
2.1 Satellite dynamics crossing of the magnetic equator. The field’s dipole strength is
Wb-
m. The magnetic field in satellite body reference frame is given by,
(2.13) 2.1.5. Constraints in magnetic attitude control In (2.10), m must satisfy orthogonality conditions with respect to
and b, so one
must have, (2.14) Beside, constraints on the coils’ magnetic dipoles (actuators’ saturation) play an important role in the formulation of the magnetic attitude control problem. Such constraints are of the form
(2.15)
The magnetic dipole m is given by (Silani and Lovera 2005).
(2.16) where,
(2.17)
13
2.2 MPC So the constraints on coils’ magnetic dipoles can be translated into constraints on the control variable
. Using (2.17), it follows that
(2.18)
Which in turn implies (2.19)
2.2. MPC 2.2.1. Introduction to MPC The most attractive feature of Model Predictive Control (MPC) is its ability to successfully deal with constraints. The control signals need to be computed online in every sampling time by taking all constraints into account. In traditional MPC design, forward shift operators are used to model the trajectory of future control signal. For a complex dynamic system, large control horizon is required, which will greatly increase the number of factors involved in on-line computation. MPC’s large on-line computation demand is considered as a major obstacle in applying it in a system with limited computational capability such as a cube satellite. (Wang 2009) proposes an approach to use Laguerre networks to capture the future control trajectory, which can greatly decrease the factors involved in on-line computation, thus reducing computation demand.
14
2.2 MPC In this section, we first introduce the basic concepts of MPC for a SISO system, followed by detailed mathematic description of MPC for MIMO system and Laguerre MPC. The usual model predictive control application involves a plant having multiple inputs and multiple outputs (a MIMO system). Here the SISO application is considered and shown in Figure 2.1 (Bemporad, Morari et al.). Constraints Setpoint
MPC
Plant u
r
Y Plant
y
output
Figure 2.1: Block diagram of SISO MPC This plant could be a manufacturing process, such as a unit operation in an oil refinery, or a device, such as an electric motor, or a satellite dynamic model in our case. The main objective is to hold a single output, y, at a reference value (or setpoint), r, by adjusting a single manipulated output variable (or actuator) u. This is what is generally termed a SISO (single-input single-output) plant. The block labelled MPC represents a Model Predictive Controller designed to achieve the control objective. Model predictive control design requires a model of the impact that u has on y. It uses this plant model to calculate the u adjustments needed to keep y at its setpoint. This calculation considers the effect of any known constraints on the adjustments (typically an actuator upper or lower bound, or a constraint on how rapidly u can
15
2.2 MPC vary). One may also specify bounds on y. These constraint specifications are a distinguishing feature of Model Predictive Control. If the plant model is accurate, the plant responds quickly to adjustments in u, and no constraints are encountered. The tracking accuracy depends on the plant characteristics (including constraints), the accuracy of the plant model. If the plant model is a discrete one, MPC controller will also generate a discrete control signal, which takes action at regularly spaced, discrete time instants. The sampling instants are the times at which the controller acts. The interval separating successive sampling instants is the sampling period, Δt (also called the control interval). The Typical sampling instant in discrete MPC is illustrated in Figure 2.2. Past
Future
ymax (a) Setpoint
Measured ymin Prediction Horizon Estimated Np
umax (b)
Past Moves Control umin
Horizon -4
-3
-2
-1
k
1
N2c
3
Planned Moves 4
5
6
7
8
9
Figure 2.2:Sampling Controllerinstants state at the kth sampling instant 16
2.2 MPC Figure 2.2 shows the state of a hypothetical SISO model preditive control system that has been operating for many sampling instants. Integer k represents the current instant. The latest measured output, yk, and previous measurements, yk-1, yk-2..., are known and are the filled circles in Figure 2.2(a). If there is a measured disturbance, its current and past values would be known (not shown). Figure 2.2(b) shows the controller’s previous moves, uk-41… uk-1, as filled circles. As is usually the case, a zero-order hold receives each move from the controller and holds it until the next sampling instant, causing the step-wise variations shown in Figure 2.2(b). To calculate its next move, uk the controller operates in two phases: a) Feedback: In order to make an intelligent move, the controller needs t know the current state yk, which is the output signal of the plant. b) Optimization: Values of setpoints and constraints are specified over a finite horizon of future sampling instants, k+1, k+2...k+Np, where Np (a finite integer ≥1) is the prediction horizon. The controller computes Nc moves Uk, Uk+1... Uk+Nc-1, where Nc (≥1, ≤P) is the control. In the hypothetical example shown in the figure, Np = 9 and Nc = 4. The moves are the solution of a constrained optimization problem. In the example, the optimal moves are the four open circles in Figure 2.2(b), and the controller predicts that the resulting output values will be the nine open circles in Figure 2.2(a). Notice that both are within their constraints,
17
2.2 MPC When it’s finished calculating, the controller sends move uk to the plant. The plant operates with this constant input until the next sampling instant, Δt time units later. The controller then obtains new measurements and totally revises its plan. This cycle repeats indefinitely. Reformulation at each sampling instant is essential for good control. The predictions made during the optimization stage are imperfect. Periodic measurement feedback allows the controller to correct for this error and for unexpected disturbances (Bemporad, Morari et al.). 2.2.2. Traditional MPC Here the mathematic expressions and derivations of MPC algorithm are given. It should be pointed out that the idea of Laguerre MPC is firstly proposed by Wang (Wang 2001; Wang 2004; Wang 2009). The plant to be controlled is described by the discrete time model of the form as
.
(2.20)
Assume that the plant has p inputs, q outputs and n states. Let us denote the difference of the state variable by (2.21) and the difference of the control variable by (2.22) Then we augment the original state space model (2.21) and (2.22) as
18
2.2 MPC
(2.23) Now we model the control signal by forward operators. Let us denote the vector
The cost function of Traditional MPC is expressed as
(2.24) Where,
,
is a
qidentity matrix ,
for control signal,
is the set point vector,
is control horizon,
is the weighting matrix
is prediction horizon. The triplet (A, B,
C) is called the augmented model. Now we calculate derivation of the cost function J: (2.2.5) The necessary condition of the minimum J is obtained (2.2.6) The optimal solution for the control signal is (2.2.7) Now we derive the close-form control law. Because of the receding horizon control principle, we only take the first element of control, thus 19
at time ki as the incremental
2.2 MPC
(2.28) where
is the first element of
and
is the first row of
F. (2.28) is in a standard form of linear time-invariant state feedback control. The state feedback control gain vector is Kmpc. Therefore, with the augmented design model
the closed-loop system is obtained by substituting (2.28) into the augmented system equation; changing index ki to k, leading to the closed-loop equation (2.29) Thus we can see when constraints have not been taken in to consideration, the MPC control algorithm can be written as a closed-loop form. All parameters required in MPC control algorithm can be calculated off-line and the on-line computational demand is relatively low. 2.2.3. Laguerre MPC Now we model the control signal with Laguerre functions. Laguerre functions are a set of discrete orthonormal basis functions that can be expressed as (2.30)
(2.31) Letting
denote the inverse z-transform of . 20
,
2.2 MPC The set of discrete-time Laguerre functions satisfies the following difference equation: (2.32) where the initial condition is given by (2.33) is the Laguerre scaling factor,
, and
is a N×N matrix. For example
N=5,
. The Laguerre functions can be used to describe the impulse response of the stable system. If we consider the future control signals in model predictive control as the impulse response of a SISO system,
can be described as (2.34)
where
.
For a MIMO system,
is given in following equation:
(2.35)
where
is the current incremental control,
incremental control at sample k and
is the future
represents the Laguerre network
description for the ith control. The input matrix can be partitioned to . 21
2.2 MPC Then the cost function becomes (2.36) where the matrix
and
are
] is the ith column of the B matrix, Q and weighting matrix for state variables,
are weighting matrix,
is
is a diagonal matrix (N × N) for
control variables . The object of model predictive control is to solve an optimization problem that takes into account the constraints. Although model predictive control is able to deal with many kinds of constraints either on control signals or on output signals, here we will only focus on constraints on the Amplitude of the Control Variables. Optimization in MPC is realized by minimizing the object function subject to some constraints, which can be considered as a quadratic programming problem. The objective function J and the constraints in Quadratic Programming are expressed as
(2.37) (2.38) where E, F, M and γ are compatible matrices and vectors in the quadratic programming problem.
22
2.2 MPC A simple algorithm called Hildreth’s Quadratic Programming was proposed to solve Quadratic Programming problem. The iteration expression of Hildreth’s Quadratic Programming Procedure is given in following equation: (2.39) where (2.40)
m means the mth iteration, the scalar and
is the ijth element in the matrix
is the ith element in the vector.
,
is a column
vector called Lagrange multiplier. The number of its components is equal to the number of inequality equations for input constraints and
is the ith component of the
Lagrange multiplier. When the iteration is completed, the converged Lagrange multiplier
contains
either zero or positive values. The constrained minimization over x is given by (2.41) As it can be seen from (2.40), the on-line computation demand of MPC is mainly determined by the size of matrix
. The size of matrix M is determined
by the number of inequality equations, which can’t be changed; therefore the size of matrix E is the critical factor for decreasing the on-line computation requirement. For traditional MPC, the cost function is given in (2.24) . To perform optimization, we substitute the matix E in (2.37) with matix size of matrix
is (
. It should be noted that the and hence it can be concluded that
the number factors involved in on-line computation is determined by the control
23
2.3 Simulations horizon
. For some complex dynamic system,
has to be selected greater than 80
to stabilize the system, which will lead to a large computation demand. For Laguerre MPC, the matrix E in (2.37) can be substituted by , which is an matrix. The number of factors involved in on-line computation is decided by the Parameter N. It should be noted that N can be a factor of Laguerre scaling factor
when the
is selected greater than zero, which means the factors
involved in on-line computation will be decreased greatly when Laguerre functions are utilized in MPC design. If a is equal to zero, N has to be chosen to be equal to and under this situation the Laguerre MPC becomes identical to traditional MPC. This is an very attractive property of Laguerre MPC because we can easily compare the performances of these two MPC algorithms by change the value of Laguerre scaling factor .
2.3. Simulations Example A A mechanical system that is highly oscillatory and nonminimum-phase is given by the transfer function
The input amplitude in our simulation is within the limits of ±0.3. For traditional MPC design using forward operator, Nc is chosen to be 100 and Nc is chosen to be 100 to get the satisfactory performance (Figure 2.3). When Nc is selected as 40, obvious degradation occurs (Figure 2.4). For this SISO system, the number of parameters involved in on-line computation is 100.
24
2.3 Simulations For Laguerre MPC, when scaling factor a is chosen to be 0.9, only 10 parameters are required to get satisfactory performance (Figure 2.5). 1.5
1
0.5
0
-0.5
-1
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
80
100
120
140
160
180
200
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05
Figure 2.3: Example A. Closed-loop response with forward operator (constrained case): Nc =100. Top plot: system output; bottom plot: control signal. 1.5 1 0.5 0 -0.5 -1
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
80
100
120
140
160
180
200
0.4 0.3 0.2 0.1 0
Figure 2.4: Example A. Closed-loop response with forward operator (constrained case): Nc=40. Top plot: system output; bottom plot: control signal.
25
2.3 Simulations 1.5 1 0.5 0 -0.5
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
80
100
120
140
160
180
200
0.4 0.3 0.2 0.1 0
Figure 2.5: Example A. Closed-loop response with Laguerre functions (constrained case): a=0.9, N=10. Top plot: system output; bottom plot: control signal. Example B This is a MIMO example which is using MPC controller to control the attitudes of a low earth satellite. The satellite attitude dynamics are approximated well by a linear model:
Where,
Where
are the attitude angle of satellite about roll, pitch and yaw axes
respectively,
is disturbance torque which is considered as zero in this paper,
26
is
2.3 Simulations
control torque and
,
,
,
is the orbital velocity,
are moments of inertia about x, y and z axis respectively. All involved parameters are listed in Table 2.1. 6.858e-3, 7.858e-3, 8.164e-4
Moments of inertia about x, y, z (
Amplitudes constraints on actuators (
3e-6, 3e-6, 3e-6
)
Sampling interval (s)
10
Prediction horizon
60
Scaling factor for three control signals
0.5,0.5,0.5
Number of terms for three control signals
5,5,5
Control weighting R
0.1,0.1,0.06,
Yaw angle (deg)
Pitch angle (deg)
Roll angle (deg)
Table 2.1: Satellite parameters 20 10 0 -10
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
0
10
20
30 40 Time (mintues)
50
60
70
20 10 0 -10 20 10 0 -10
Figure 2.6: Example B: satellite attitude angles 27
Yaw angle rate(rad/s) Pitch angle rate(rad/s) Roll angle rate(rad/s)
2.3 Simulations
0.01 0 -0.01 -0.02
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
0
10
20
50
60
70
0.01 0 -0.01 -0.02 0.01
0
-0.01
30 40 Time (mintues)
Yaw torque (Nm)
Pitch torque (Nm)
Roll torque (Nm)
Figure 2.7: Example B: satellite angular rates 0.05
0
-0.05
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
0
10
20
50
60
70
0.05
0
-0.05 0.05
0
-0.05
30 40 Time (mintues)
Figure 2.8: Example B: control torques 28
2.4 Summary What we want to emphasize here is that in traditional MPC design, the control horizon
is normally chosen to be over 30 to get a sound performance for satellite
attitude control (Wood, Chen et al. 2006) while by using Laguerre function, N can be selected as 5, which will lead to 6 times less parameters involved in on-line computation.
2.4. Summary In this chapter a simplified MPC algorithm based on Laguerre functions is considered. By using Laguerre functions, the MPC structure is reconstructed and the number of coefficients is reduced. A simulation on a motor illustrates 6 times less parameter can be used in Laguerre MPC and lead to the same or better control performance compared with traditional MPC. The simulations on the three-channel satellite further verify its applicability.
29
Chapter 3. MPCDP In chapter 2 we have derived the Laguerre MPC algorithm and proven that it can reduce the on-line computation demand while maintaining a sound control performance. In the chapter we consider the design of a MPC dedicated processor call MPCDP (MPC dedicated processor). In section 3.1, an introduction of various kinds of digital control devices is given. In section 3.2, the numerical issue is considered and a unique multiplier design is introduced. In section 3.3 the instruction set of MPCDP is described. An overview of the general structure of our instruction set is given first, followed be the detailed instruction set specification. In section 3.4, we focus on the microarchitectures design. In section 3.5, the five-stage pipeline structure of MPCDP is considered. In section 3.6 we focus on the assembler and compiler design for MPCDP. A C-like programming language for MPCDP is also designed. In section 3.7, we focus on the software of MPCDP. One complete C-like program is given to demonstrate the software control architecture of MPCDP.
3.1. Digital Controllers To implement a control algorithm, it is necessary to map the control law into some kinds of hardware architecture that will actually perform the task. There are many alternatives, it might be implemented in software on general-purpose processors, microcontrollers, digital signal processors or it might be implemented in specialpurpose processors. 30
3.1 Digital Controllers Control applications may also take advantage of entire platforms built around generalpurpose processors like personal computers, workstations and stand-alone boards. 3.1.1. General purpose processor General purpose processors are not a cost-effective solution in many applications, and often the performance requirements in terms of throughput, power consumption and size cannot be met. The reason for this is the mismatch between general-purpose processor architectures and most control algorithms that require a large number of repeated arithmetic operations of a relatively simple nature and a small number of input/output operations. General-purpose processors are designed to perform a multitude of functions to support applications which rely almost entirely on manipulation of data; this involves storing, organizing, sorting and retrieving information. To perform those tasks, the processors provide a number of functions that allow wide-ranging mixtures of operations and control flow that can be data dependent, making large jumps from one area of the program memory to another. Thus, the ability to move data from one location to another and testing for inequalities (A = B, A < B, etc.) becomes essential(Lapsley, Bier et al. 1997). These processors were not originally designed for multiplication-intensive tasks, even some modern processors would require several instruction cycles to complete a multiplication because they do not have dedicated hardware for single-cycle multiplication, and as a consequence they are not well suited to perform control algorithms. To solve this problem, highend processors have been enhanced to increase the computation of arithmeticintensive tasks. A common modification is the addition of SIMD-based instruction set extensions that take advantage of wide resources such as buses, registers and ALUs, 31
3.1 Digital Controllers which can be seen as multiple smaller resources. However, despite the high performance operation offered by these processors, they are not widely used in embedded applications due to their cost (Eyre and Bier 1999). 3.1.2. Microcontroller A microcontroller design is focused on integrating the peripherals needed to provide control within an embedded environment and a microprocessor core. Commonly, a microcontroller incorporates in a single chip at least the necessary components of a complete computer system: CPU, memory, clock oscillator and input/output ports, plus some additional elements such as timers, serial units, and analogue-todigital/digital-to-analogue converters. These features allow them to be simply wired into a circuit with very little support requirements; usually, they only require power and clocking (Cady 1997; Predko 1999). The primary role of microcontrollers is to provide inexpensive, programmable logic control and interfacing to external devices. Thus, they are not expected to provide arithmetic-intensive functions. When included within complex systems applications, they are used to interpret input, communicate with other devices, and output data to a variety of different devices. Microcontrollers add a great deal of flexibility in the product development process as they can be used for a variety of applications. Another advantage is the fact that microcontrollers are member of families that present many different combinations of hardware features, so the most suitable device for a specific application can be selected where possible. The programs to be executed are stored in the internal memory (ROM or RAM) to provide a single chip solution.
32
3.1 Digital Controllers 3.1.3. Digital signal processor Digital signal processors (DSP) have been designed to overcome some of the limitations found in general-purpose processors. DSPs introduce some architectural features that accelerate the execution of repetitive multiply-accumulate operations of digital control algorithms (Eyre and Bier 1999). DSPs can be used for controlling external digital hardware as well as processing the input signals and formulating appropriate output signals. Although most real-time digital control applications require a large amount of data calculations, the programs that implement them are normally very simple. As a result, these programs can be stored in internal memory to reduce the transfer time. The design process involves mainly coding the control algorithm either using a high-level language or directly in assembly language. Then the source code is compiled into an object code that can be executed by the processor. This approach allows rapid prototyping, but unfortunately it is not always possible to meet the requirements of power consumption, size or cost. The main reason is that the standard DSP is designed to be flexible in order to support a wide range of digital signal processing algorithms that use only a few of instructions provided (Lapsley, Bier et al. 1997). 3.1.4. Special-purpose processor Special-purpose processors, with a particular combination of registers, logic elements and interconnections, open the possibility of achieving in one clock cycle what a traditional programmable processor requires tens or even hundreds of clock cycle(Cumplido-Parra 2001). The term special-purpose processor has been used to define a wide range of degrees of dedication and specialization. We can say that a 33
3.1 Digital Controllers special-purpose digital control processor is a dedicated hardware entity whose function is to perform a specific, well defined, set of digital control algorithms in realtime. Just as DSPs are more efficient and cost-effective than general-purpose processors to execute high-speed arithmetic operations, special-purpose processors have the potential of overpower DSPs due to their specialized nature. As only the required functions are placed in hardware, special-purpose processors can be less expensive than other processors, especially for high volume products. The possibility of integrating a whole control system into one chip has several effects. It increases the processing capacity and simultaneously reduces the size of the system, power consumption, and pin restriction problems. Additionally, it improves system reliability and offers protection of intellectual property. Of course, developing specialpurpose architectures presents some drawbacks. Among them are the effort and expense associated with custom hardware development, especially for custom chip design. However, the problems associated with custom hardware can be partially solved using high-level hardware design languages such as VHDL and logic synthesis CAD suits allied to large low-cost reprogrammable FPGAs (Cumplido-Parra 2001) . A major advantage of this approach is that the word length can be adjusted to the system's requirements. Thus the size of the architecture can be kept to a minimum. However, the performance improvements come with the cost of larger design effort. 3.1.5. Soft processors A soft microprocessor (also called soft core microprocessor or a soft processor) is a microprocessor core that can be wholly implemented using logic synthesis. It can be implemented via different semi-conductor devices containing programmable logic (e.g., ASIC, FPGA, CPLD). One advantage of a Soft Core Processor (SCP) is its 34
3.1 Digital Controllers flexibility: it uses only the processor features required for a specific application. Another advantage is its ability to integrate customized user Intellectual Property (IP) cores, which can result in a dramatic acceleration in software execution time due to algorithms being executed in parallel in hardware and not sequentially in software. Soft processors for the FPGA platform have become a focus of research. Currently, the major FPGA companies, Altera and Xilinx, provide the in-order soft processors Nios/ NiosII (Altera ; Altera 2005) and Microblaze (Xilinx 2005), respectively. In the design of SoPC systems, these processors have rapidly grown in popularity due to their flexibility and ease of integration into the system. On the other hand, the trend of supporting multithreaded execution has been taking place not only in highperformance and desktop systems but in embedded systems as well. In both worlds the motivation is to leverage increased presence of multithreaded workloads. Several embedded processors with multithreading support are commercially available. While workloads for embedded systems based on ASIC and FPGA platforms tend to be similar, hardware-based multithreading capabilities on FPGA platforms targeting mutlithreaded workloads have not yet been fully explored. 3.1.6. Field-programmable gate array (FPGA) A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by the customer or designer after manufacturing—hence "fieldprogrammable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare). FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the 35
3.1 Digital Controllers functionality after shipping, partial re-configuration of the portion of the design and the low non-recurring engineering costs relative to an ASIC design (not withstanding the generally higher unit cost), offer advantages for many applications. FPGAs contain programmable logic components called "logic blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together"—somewhat like many (changeable) logic gates that can be inter-wired in (many) different configurations. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory. In addition to digital functions, some FPGAs have analog features. The most common analog feature is programmable slew rate and drive strength on each output pin, allowing the engineer to set slow rates on lightly loaded pins that would otherwise ring unacceptably, and to set stronger, faster rates on heavily loaded pins on high-speed channels that would otherwise run too slow. Another relatively common analog feature is differential comparators on input pins designed to be connected to differential signaling channels. A few "mixed signal FPGAs" have integrated peripheral Analog-to-Digital Converters (ADCs) and Digital-to-Analog Converters (DACs) with analog signal conditioning blocks allowing them to operate as a systemon-a-chip. Such devices blur the line between an FPGA, which carries digital ones and zeros on its internal programmable interconnect fabric, and field-programmable analog array (FPAA), which carries analog values on its internal programmable interconnect fabric (Altera ; Xilinx).
36
3.2 Numerical issues 3.1.7. Other architectures Other architectures include general purpose parallel processors which are based on multi-processor or multi-computer systems(Wanhammer 1999), fuzzy logic controllers which can be applied to systems with undefined boundaries that are difficult to represent using explicit difference or differential equations (Costa, Gloria et al. 1997) and combined approaches which look into the integration of the DSP functionality with the microcontroller to offer the benefits of both the architectures (Eyre and Bier 1999).
3.2. Numerical issues In digital control systems implementation, it is an important issue to determine the type of binary numeric representation for implementation of a digital controller. The numeric representation and the type of arithmetic used can have a profound influence on the behaviour and performance of the controller. Before the control engineers implement a control law, they need to choose a fixedpoint or a floating-point arithmetic for the representation of coefficients and state variables. The decision largely depends on the budget of the project and the size of the targeted controller. Most microcontrollers and some DSPs use the fixed-point arithmetic in which only a finite word length with a fixed scaling is available to represent the state variables and coefficients. It is always possible to introduce floating-point calculations, for example by means of a compiler, but each calculation will then take many processor cycles. For computational efficiency, state variables and coefficients must be scaled to fit the word length provided by the processor. The fixed point arithmetic is a low-cost solution for digital controller implementations
37
3.2 Numerical issues compared to the floating-point arithmetic, and is widely applied in cost-sensitive applications (Schlett 1998; Goodall 2001). The fixed-point arithmetic represents the number in a fixed range with a finite number of bits (word width). Numbers outside the specified range can only be represented if they are scaled, in which case the scalings must be allowed in the computations. The floating-point arithmetic still provides a fixed word length, but expands the available range of values. It represents the number in two parts: a mantissa and an exponent. The mantissa value lies between -1.0 and 1.0, while exponent scales (in terms of powers of two) the mantissa value in order to create the actual value represented(Cumplido-Parra 2001). Note, however, that the mantissa and exponent will have a fixed word length. The floating-point arithmetic offers an easeof-use advantage due to the fact that it provides wider dynamic range and usually gives higher precision than fixed-point arithmetic does. The increase of dynamic range also allows a designer to ignore scaling problems because it reduces the probability of overflow. In contrast, with the fixed-point arithmetic, sometimes it is necessary to scale signals at various stages of the program to ensure adequate numeric performance. Unfortunately, the floating-point arithmetic is generally slower, more expensive and more difficult to implement in hard- ware. The increased cost results from the more complex circuitry required. In addition, the larger word sizes of floating-point processors often means the memory and buses are wider, raising the overall system cost. Thus, the choice of the fixed-point or floating-point arithmetic is determined by the system requirements in terms of dynamic range and precision as well as price and size. The dynamic range is the ratio, usually expressed in dB, between the largest and smallest numbers that can
38
3.2 Numerical issues be represented. The precision of a digital system is dependent upon the word-length that the arithmetic uses. In MPC algorithm, the dynamic range of the coefficients can be very large, which requires a floating-point binary representation. In contrast, the variables are actually in a narrow range but require a high accuracy, that is, a relative large fractional part. On the other hand, the arithmetic blocks for floating-point operations, especially the adding block, would be very complex and consume too much system resource. As a consequence, instead of adopting a fixed point format system or a floating format, MPCDP combines these two formats together, so there are two types of binary representations in MPCDP. All binary numbers are expressed in 2’s complement format. The state variables are held in a 32-bit fixed-point format, with the input values brought in as signed fractional form, a 15-bit allowance for integer bits and 16 fractional bits for underflow. The coefficients are held in a floating-point format, with 14-bits for the signed mantissa and 7-bits for the signed exponent. In this case, the range of our exponent is -63to 63. In order to get the high accuracy, the range of the absolute value of mantissa is set between 0.5 and 1. As it has been discussed, fixed-point number format makes hardware easy to implement while floating-point format can provide large dynamic range. This combined number format in MPCDP not only ensures a large dynamic range for variables, but also keeps the hardware relatively simple to implement. One problem is the how the arithmetic operations can be performed between two operands in different number formats. Actually, in MPC algorithm, the multiplications are always between a variable and a coefficient, which means the multiplication operations in MPCDP are always between a fixed-point number and a floating-point number. As a consequence, one unique multiplier block is designed to 39
3.3 Instruction set design perform the operations mentioned above. The multiplier block takes one fixed-point number as its multiplicand and one floating-point number as its multiplier to produce a fixed-point product. We will describe this in detail in section 3.4. Besides, the adding operations are always between two numbers which are the products from the multiplication operations. Thus, both operands of adding operations are the fixed-pointed numbers, which makes the adding blocks in MPCDP quite simple to implement. 32-bit
state
Intege variable
Sign
Fraction
r 15 bits
16 bits
Figure 3.1: Numerical specification for state variables
Signed Signed mantissa Sign Exponent
Sign
x2
7 bits
14 bits
Figure 3.2: Numerical specification for coefficients
3.3. Instruction set design 3.3.1. Defining instruction set The first step in our processor design is to define an instruction set. We try to make this Instruction set simple to implement and has the smallest number of instructions necessary to efficiently execute the MPC algorithm. We also need to ensure it is a 40
3.3 Instruction set design logically complete set. Four types of instructions are included in MPCDP Instruction set: arithmetic instructions, control instructions, data transfer instructions and miscellaneous instructions. Arithmetic instructions Before determining arithmetic instructions, the number of operands needs to be specified. Basically, there are four possible options (Lilja and Sapatnekar 2005).
Three-address machine: A three-address machine explicitly specifies both input operands and the destination for the output value in each arithmetic instruction. Two-address machine: In a two-address machine, only the two source operands are specified explicitly. The destination operand is implicitly assumed to be the same as one of the source operands.
One-address machine: A one-address machine specifies only a single source operand. The other source operand and the destination are implicitly assumed to be a special register, typically called the accumulator.
Zero-address machine: The simplest form of an arithmetic instruction specifies only the operation to be performed while implicitly assuming the locations of both input operands and the result destination. This is precisely what happens in a stackbased, or zero-address machine. In this type of machine, the two input operands are assumed to be the top two elements of a special storage structure called the stack.
An add instruction causes these two elements to be popped off the stack and added together with the result being pushed back on to the top of the stack. A stack is a very powerful data structure that is used frequently in various aspects of processor design. However, a stack-based machine is somewhat limited in the way it can access the 41
3.3 Instruction set design necessary operands. As a result, stack-based machines are not particularly common. A notable exception, however, is the Java Virtual Machine (JVM) that is used to execute programs compiled from the Java programming language. A three-address machine obviously will require more bits in each instruction to explicitly specify its operands compared to any of the other choices. However, a three-address machine may require fewer instructions to execute a series of operations from a high-level language than the other types of machines. In addition, arithmetic operations are most naturally expressed when explicitly specifying two operands and the destination. This feature tends to make the compiler writer’s job easier than when writing a compiler for one of the choices that implicitly specifies one or more of the operands. Because the MPC algorithm is relatively complex and requires a lot of loops and branches, a three-address machine will make the implementation of MPC algorithm much easier. Taking these various trade-offs into account, we have chosen to use the threeaddress format for the arithmetic operations in our MPCDP. Because MPC algorithm is based on the state space model of dynamic systems, most operations are matrix multiplications
and
additions.
For
matrix
multiplications,
additions
and
multiplications between two matrix elements are two essential operations. Besides, ADDs are also used extensively for address computations. When we use two’s-complement arithmetic, a subtract instruction is not actually necessary since subtraction can be performed by adding the negation of the second operand to the first. Negating the second operand requires at least one more instruction in addition to the add operation. Also, when comparing two values, it is convenient to have an explicit subtract operation in the instruction set. As a result, we have decided to include the SUB operation in our instruction set. 42
3.3 Instruction set design Since the MPC algorithm can be modified to exclude division, a DIV instruction is not necessary. One subtlety that arises in the arithmetic operations of a computer system is that, with a fixed number of bits available, there is a maximum absolute value that can be computed and stored. Trying to store a value larger than what will fit within the available number of bits will cause an overflow. Our system uses two’s complement arithmetic with 32 bits and obviously one of these bits will be dedicated to the sign bit. This then leaves the remaining 31 bits to store the actual value. Thus, the largest value that can be stored is
. If the
absolute value of a result produced by one of the arithmetic instructions is larger than , an error has occurred which must somehow be reported. Actually, in our MPC algorithm, the overflow is most likely to occur in the multiplication of two numbers since both operands are 32-bit fixed point numbers. To solve this problem, we design a multiplier with 21-bit floating point multiplier and 32-bit fixed point multiplicand to generate a 32-bit fixed point product, which will be explained in detail section 3.5, chapter 3. For addition and subtraction operations, overflow is not likely to occur since 32-bit fixed point number is quite enough for our particular algorithm. However, we do have an overflow detection unit to set the value of one bit register to be 1 if overflow occurs. The only reason to include the overflow detection hardware is that overflow bit is used as one test condition for some branch conditions. Logical operation instructions The logical operation instructions, such as the Boolean AND, OR, exclusive-OR (XOR), and NOT, operate on the individual bits within a binary value. Although logical operations are important for many other algorithms, we actually do not need
43
3.3 Instruction set design any logical operations in our implementation. For our dedicated processor, none of logical operations is included. Control instructions The control instructions alter the normal sequential execution of instructions that occurs with the steady incrementing of the PC. There are two basic types of control instructions: 1. Conditional branch, which alters the PC depending on the outcome of a test 2. Unconditional branch, or jump that always alters the PC The conditional branch must specify the operands to be tested, plus the condition for which the operands are being tested. In addition, both types of branches must specify the target address to which they will transfer control. In our instruction set we decide to implement the conditional branch instructions using PC-relative addressing to specify the branch target. PC-relative approach allows every instruction to branch forwards or backwards the same amount from the branch instruction’s location in memory. It also allows a program to be relocated in memory without having to recalculate all of the target addresses. However, it does require an adder to calculate the target address, PC + offset. The unconditional branch, which we will call a jump instruction, will use the absolute addressing mode for specifying the branch target. This approach of defining two types of control instructions is commonly used in the design of processors as it provides a good balance between the two methods of specifying the target address. In addition to the jump instruction, we also include a jump-and-link instruction, which saves the address of the instruction immediately following the jump into a designated register. This instruction is quite useful for high level languages, which have features such as procedure calls and functions require the processor to 44
3.3 Instruction set design implement some mechanism for jumping back to the instruction that follows the initial jump. For a control instruction, a condition to be tested and the operands to be used in the test have to be specified. However, it is relatively complex to specify both of them in one instruction because it performs two essentially independent operations, the compare and the test of the condition. This complexity can lead to the processor’s critical path delays and the overall clock time. As a result, we use an alternative to separate the compare from test. With this approach, we end up with two different types of instructions. The compare instruction requires two input operands. Its output is one or more bits whose values are determined by the result of the comparison. The separate conditional branch instruction then tests the result of the compare instruction to decide whether or not to branch. The outcome of the compare operation typically is stored in the processor’s condition code bits. These bits are part of the processor’s state. They are set by the compare instruction, and subsequently tested by the conditional branch instructions. The condition codes necessary to implement the various conditional tests are described next. There are four condition code bits that are set as a result of an explicit compare instruction.
Carry bit – C: The C bit is set to 1 to indicate that a carry out has occurred from the most significant bit of an unsigned operation. It is set to 0 if no carry occurs. Zero bit – Z: The Z bit is set to 1 if all of the bits in the result are 0. If at least a single bit is a 1, the Z bit is set to 0. Thus, this condition bit indicates that the arithmetic value of the result was zero. Negative bit – N: This bit is set to 1 when the result is negative. A 0 value indicates that the result was positive or zero.
45
3.3 Instruction set design
Overflow bit – V: The overflow bit is set to 1 if the result produces a two’s complement arithmetic overflow. The V bit is set to 0 when no overflow has occurred.
These specific conditions are chosen so that a rich variety of conditional branches can be computed using these condition code bits. These codes are set by a compare instruction, CMP. This instruction is identical to the SUB instruction, except that it does not save the results of the subtraction. The only results it saves are the condition code bits. The whole point of defining the condition code bits is so that the processor can test them and make decisions based on their values. This is precisely the function of the conditional branch instructions. A typical use of these instructions would be first to compare two values using the CMP instruction. This comparison would do nothing more than set the condition code bits. One of the conditional branch operations then would read the appropriate condition code bits as inputs and branch or not, depending on whether the specific condition being tested evaluated to true or not. Given that we have defined three condition code bits, it is logical to have three different conditional branches that check the value of each bit individually. Thus, we define the branch-on-equal (BEQ), and branch-on-minus (BMI) instructions to branch if the corresponding condition code bit is set. Note that the BEQ is testing if the Z bit is set and that the BMI is testing if the N bit is set. Furthermore, the additional logic needed to test the complementary condition for each condition code bit is trivial, consisting of nothing more than a single inverter. This then leads to the branch-onnot-equal (BNE), and branch-on-plus (BPL) conditional branch instructions. It should be noted that although it is easy to implement Branch on overflow clear (BVC) by
46
3.3 Instruction set design checking the Carry bit, this branch is useless for our particular MPC algorithm and thus not included in our instruction set. More branch conditions are included by testing more complex relationships than simply whether a condition bit is set or reset. In addition to above branches, we also include branch-on-less-than (BLT), branch-on-less-than-or-equal-to (BLE), Branch on greater than (BGT), Branch on less than or equal (BLE). These braches test more than one condition bits, which are summarized by table 3.1. For completeness, we have included a branch-always (BRA) instruction.
Branch always
BRA
1
Branch never
BNV
0
Branch on overflow clear
BVC
~V
Branch on overflow set
BVS
V
Branch on equal
BEQ
Z
Branch on not equal
BNE
~Z
Branch on greater than or equal
BGE
( ˜N & ˜ V) | (N & V)
Branch on less than
BLT
(N & ˜ V) | ( ˜N & V)
Branch on greater than
BGT
˜ Z & (( ˜N & ˜ V)| (N & V))
Branch on less than or equal
BLE
Z | ((N & ˜ V) | ( ˜N & V))
Table 3.1: The conditional branch instructions available in MPCDP, and the conditions they test. The load-store instructions and I/O
47
3.3 Instruction set design We define three different load instructions and two different store instructions. The differences in these instructions are in the addressing modes they use to specify a particular location in memory. The load-immediate instruction (LDI) uses an immediate addressing mode in which the actual value to be loaded has been previously stored by the assembler within the instruction itself. Thus, the LDI instruction does not actually initiate a new memory access. Since the LDI instruction has already been fetched from memory, the LDI simply copies the value from the appropriate field in the instruction word and stores it in the designated destination register. This instruction is commonly used to initialize variables with constant values, for instance. The load-direct (LD) instruction performs a basic read of a memory location and stores the value read into the designated destination register. This instruction uses a direct addressing mode in which the address of the desired location in memory is encoded directly into the instruction. With the load-indexed (LDX) instruction, in contrast, a register address is encoded into the instruction as an operand, in addition to the address of the destination register. The contents of this source operand register are assumed to have been previously initialized with the address of the desired location in memory by other instructions executed by the program. This register is called an index register and the addressing mode is called indexed addressing. We further extend this indexed addressing mode by allowing an offset value to be added to the contents of the index register. The offset value is stored as an immediate value within the instruction. The sum of the offset and the address in the index register then is the actual address in memory to be read. This offset-plusindex addressing mode is quite useful to compilers when generating the code needed to access parameters and local variables allocated on a run-time stack for procedure and function calls in high-level languages such as C. We also define store instructions 48
3.3 Instruction set design that correspond to the LD and LDX instructions. The store-direct (ST) instruction uses a direct addressing mode to specify the address of the location in memory to which the value in a specified source register should be written. Similarly, the storeindexed (STX) instruction specifies the destination address using the same offsetplus-index addressing mode as the LDX instruction. In addition to moving values between registers and memory, it is convenient to be able to move values just between registers. Consequently, we define a special move (MOV) instruction that copies the contents of one register into another register. We do not actually implement this instruction in hardware, however. Instead, the assembler automatically converts this instruction into an ADD instruction where the second operand is the value 0. Adding zero to the contents of a register and storing the result is equivalent to copying the contents of the source operand register into the destination register. One important feature of our instruction set is it does not include explicit Input/output (I/O) instructions. Instead, I/O function can be performed by using loadstore instructions since the I/O ports address are uniformly encoded with the memory address. Opcode Mnemonic
Decimal
Binary
Operation
NOP
0
00000
No operation
ADD
1
00001
Addition
SUB
2
00010
Subtraction
CMP
7
00111
Arithmetic comparison
Bxx
8
01000
Conditional branch (xx = specific condition)
JMP
9
01001
Jump indirectly through a register + offset.
49
3.3 Instruction set design JMPL
9
01001
Jump and link indirectly through a register + offset.
LD
10
01010
Load direct from memory.
LDI
11
01011
Load an immediate value.
LDX
12
01100
Load indirect through index
ST
13
01101
Store direct to memory.
STX
14
01110
Store indirect through index register + offset.
HLT
31
11111
Halt execution.
MOV
Move (actually copy) one register to another.
Table 3.2: The operations and corresponding opcodes for the MPCDP processor
Miscellaneous instructions To insert a fixed delay into a program, we determine to have an instruction that dose no work. The NOP instruction does no work and has no operands, outputs, or side effects. Finally, it is useful to have an instruction to indicate that the processor should terminate its processing. It is worth noting that our processor is designed to run continuously in an endless loop. However, it is convenient to have a halt instruction (HLT) in terminating a Verilog simulation. 3.3.2 Specifying instruction set Our processor is a 32-bit machine. The ALU and data paths are 32-bit wide as well. For consistency, we decide that memory address will be 32-bit, and the memory is byte-addressable. That is, each 32-bit memory address points to a unique eight-bit location in the memory. The number of registers is arbitrarily decided to be 32. Since
50
3.3 Instruction set design we have 32-bit basic architecture, we decide to make all instructions fit into four bytes to make a 32-bit instruction word. We use five bits as opcode bits to uniquely specify each operation available in the instruction set since five bits provides 32 unique bit patterns. We arbitrarily assign opcode values to the instruction it is executing, which is shown in Table 3.2. Now we give the precisely definitions of the operation of each instruction. Arithmetic instructions Each of the arithmetic instructions must specify two source operand registers and a destination register in addition to the opcode. Since we have 32 registers, five bits will be needed to specify each operand and the destination for a total of 15 bits. Including the five opcode bits, 20 bits will be needed to completely specify these instructions, which easily fit within the 32 bits available. Instead of requiring both operands to always be in registers, we can allow one of the operands to be specified using an immediate addressing mode. In this mode, the actual value of the operand, instead of its address, can be encoded into the extra bits available in these instructions. Note that we will need one of these bits to indicate the addressing mode for this operand, that is, whether the corresponding instruction bits should be interpreted as a register number or as an immediate value. In Figure 3.5, ro1 and ro2 are two register containing two operands. rdst is the destination register to store the corresponding result. Immed16 is a 16-bit immediate number. 31…27
26 … 22
21 … 17
16
15 …. 11
10 … 0
opcode
rd
ro1
0
ro2
00 0000 0000
31…27
26 … 22
21 … 17
16
15 … 0
opcode
rd
ro1
1
immed16
51
3.3 Instruction set design Figure 3.3: The instruction format for the arithmetic and logical operations. Control instructions In addition to the standard opcode field, the conditional branch instruction needs to specify the condition to be checked and the address to which the program should branch if the condition is true. If the condition evaluates to false, the program continues executing with the instruction immediately following the branch. Note that only 16 bits are available to store this value. However, our processor operates on 32-bit quantities. So this 16-bit field must be extended to a 32-bit quantity. Here we treat it as a signed integer and do sign extension to convert it to a 32-bit signed integer. This immediate field thus can contain a value in the range . CPM instruction format is similar to SUB instruction, except that the destination register field will be ignored by the processor. As we talked before, the multiplication requires one 32-bit fixed point number and 21 bit floating point number. Consequently, we only use the lower 21 bits of the 32 bits in operand field for multiplication operation. The condition filed 23 to 26 specify which branch condition should be tested. Table 3.1 lists 14 conditions that we can test with the conditional branch instruction. Then we decide to use bits 22 to 0 to encode the branch target address. Obviously we cannot fully encode a complete 32-bit memory address into the 23-bit field available in the instruction. So we determine to interpret this 23-bit immediate field as an offset value from the current PC value. That is, the branch target address is determined by adding the sign-extended value stored in this immediate field to the current value of the PC. This allows the conditional branch instructions to branch forwards or
52
3.3 Instruction set design backwards to any address within
addresses of the branch instruction itself.
In Figure 3.4, cond is 4-bit condition code and immed23 is a 23-bit number. Bxx
31 … 27
26 … 23
22 … 0
01000
cond
immed23
Figure 3.4: The instruction format for the conditional branch instruction. In contrast to the conditional branch instructions, we also have jump instructions to be able to directly jump to any location in the address space. In order to specify a complete 32-bit target address while keeping the instruction size fixed at 32 bits, JMP and JMPL instructions specify the jump target address by encoding a register address into the instruction. The JMP instruction needs to specify only a single operand, that is the address of the register that contains the target address. The JMPL also needs to specify the jump address plus a destination register in which to store the return address. There is also one bit indicating whether this is a JMP or a JMPL operation. JMP
JMPL
31 … 27
26 … 22
01001
00000
31 … 27
26 … 22
01001
rd
21 … 17 ro1 21 … 17 ro1
16
15 … 0
0
immed16
16
15 … 0
1
immed16
Figure 3.5: The instruction format for the two jump instructions. Data transfer and I/O instructions The load-immediate (LDI) instruction must specify the opcode, the immediate value to be stored, and the destination register into which the value is to be stored. Since five bits are needed for the opcode and another five bits are needed to specify the destination register, 22 bits remain for the immediate value, as shown in Figure 3.6.
53
3.3 Instruction set design Interpreting this value as a signed two’s-complement integer allows the LDI instruction to load values in the range LD/LDI
31 … 27
26 … 22
21 … 0
opcode
rd
Immed22
Figure 3.6: The instruction format for the the LDI and LD instructions. The format for the load-direct (LD) instruction is identical to that of the LDI instruction. The only difference is that for the LD, the value in the immediate field is interpreted as an address that contains the value to be read, instead of the value itself. Notice that this interpretation limits the range of addresses that can be accessed by the LD instruction. In particular, since this field is only 22 bits, the LD can access only 222 unique locations out of the 232 available in the address space. Furthermore, we decide to interpret this address using sign extension. Positive values then will translate into addresses in the range 00 0000 – 1FFFFF while negative values will be interpreted as addresses in the range FFE0 0000 – FFFFFFFF. LD is also used as the data input instruction. Our processor has five 32-bit input ports, which addresses are encoded as 40000 to 40016. If the value in LD’s address filed is between 40000 to 40016, LD instruction will load the data form input port registers to one of the 32 registers in register file. The format for the load-indexed (LDX) instruction is similar to the LDI and LD instruction. The difference is that we must use five bits from the immediate field to specify the index register. This leads to the format shown in Figure 3.7, where the rs1 field specifies the index register. LDX
31 … 27 01100
26 … 22 rd
21 … 17 ro1
54
16 … 0 Immed17
3.3 Instruction set design Figure 3.7: The instruction format for the LDX instruction. The format for the store-direct instruction (ST) is the same as that for the LD instruction, except that the rd field in bits 26 to 22 of the LD instruction is now interpreted as the number of the source register. That is, in the format shown in Figure 3.8 for the ST instruction, the substituted rst field is the address of the register whose value is to be written to memory at the address specified by the immediate field in bits 21 to 0. As in the LD instruction, the immediate value is sign-extended so that it accesses the same range of addresses as the LD. The format for the store-indexed (STX), shown in Figure 3.9, is chosen to follow the format of the LDX instruction as closely as possible. However, as with the ST instruction, the rst field in bits 26 to 22 is interpreted as the source operand register. Recall that the MOV instruction is a pseudo-instruction that gets converted into an ADD by the assembler. Therefore, the encoding of the MOV instruction will be the same as the ADD. In Figure 3.9, rst is register to be written to memory. rs1 is one of the general registers that specifies the memory address. 31 … 27
ST
26 … 22
21 … 0
rst
immed22
01101
Figure 3.8: The instruction format for the ST instruction. STX
31 … 27 01110
26 … 22
21 … 17
rst
rs1
16 … 0 Immed17
Figure 3.9: The instruction format for the STX instruction. Miscellaneous instructions Both the HLT and NOP instructions require no operands to be specified. The instruction encodings for these two instructions require only the opcode field. 55
3.4 Microarchitectures The remaining bits can be any value since they will be ignored by the processor. However, it is conventional to set all unused bits to zero.
3.4. Microarchitectures The MPCDP adopts a general five-stage pipelining processor architecture but optimised for MPC algorithm. All the calculations are carried out in the arithmetic and logic unit (ALU) with memories being used to store coefficients and instructions. One feature of MPCDP is it contains two number formats: 32-bit fixed pointed number for variables, which are the feedback signals from the plant’s outputs, and 21bit floating number, which are the MPC coefficients. There is only one data bus in MPCDP and the 21-bit floating number uses the lower 21 bits of this data bus. One unique design is the 32-bit signed multiplier which takes a 32-bit fixed-point number as its multiplicand and a 21-bit floating-point number as its multiplier to produce a 32-bit fixed point product. MPCDP input ports read the feedback signals from the plant’s outputs and MPCDP output ports send control signals to the plant’s inputs. Memory and I/O In MPCDP the address of memory and I/O blocks are uniformly encoded. As a result, same instructions can be used to read/write an I/O port or load/store an operand. Whether I/O blocks or memory block should be addressed is determined by the particular addresses specified in the Load/Store instructions. Memory block an I/O In MPCDP, instead of using separate read only memory (ROM) and random access memory (RAM), all instructions and data are stored in one general-purpose ram. Since all ALU operations require two operands, the ram has been designed to possess two data ports with distinct write clocks and write enables. It should be noted that 56
3.4 Microarchitectures normally, the initial states and coefficients are not stored in this ram. Instead, these data can be read form the immediate field in load instructions and stored in register files. However, if initial states or coefficients are stored as arrays, there is one 1MB heap in memory to store these data. Intermediate data are stored in register files instead of ram, because the speed to visit register files is much faster than visiting ram. The ram space is divided into four parts. The ram with address 0-399999 is used to store instructions. Since one instruction requires 4 bytes, the maximum number of instructions in our processor is 40000, which is enough for the implementation of MPC algorithm. Address 400000-400063 is assigned to I/O ports. Actually, there is no data stored in the ram with address 400000-400063. Since the address is occupied by the I/O ports, this part of memory space cannot be used for other purposes. There is an unused part in this ram with the address 400128-524287. This part is preserved for the future extension of I/O ports. The last part is 1 MB heap. The maximum number of elements of one array is 128, which means at least 1024 arrays can be stored in the ram. 400000-400063 Input
1 MB Array field
ports
0-399999 address Instruction 400064-400127 field Output
ports
400128-524287 address Unused field Figure 3.10: MPCDP memory structure
57
3.4 Microarchitectures In load instructions, the data is fetched from memory or I/O input ports and then sent to register files. The data sent to register is selected by a MUX according to the address. If the address is from 400000 to 400060, the MUX will send the I/O block data to register files. Otherwise it will send the data from the memory. Similarly, in store instructions, the data can be sent to the ram of to the I/O output ports according to the address. If the data is within the range 400064 to 400124, the data form register files will be sent to the output ports of the I/O block. Otherwise, the data will be stored in the ram and the values in I/O output ports remain unchanged.
Figure 3.11: Memory and I/O architecture Address
400000
400004
400008
400012
400016
400020
400024
400028
400032
I/O port
In_0
In_1
In_2
In_3
In_4
In_5
In_6
In_7
In_8
Address
400036
400040
400044
400048
400052
400056
400060
400064
400068
I/O port
In_9
In_10
In_11
In_12
In_13
In_14
In_15
Out_0
Out_1
Address
400072
400076
400080
400084
400088
400092
400096
4000100
400104
I/O port
Out_2
Out_3
Out_4
Out_5
Out_6
Out_7
Out_8
Out_9
Out_10
Address
400108
400112
400116
400120
400124
I/O port
Out_11
Out_12
Out_13
Out_14
Out_15
Table 3.3: I/O address 58
3.4 Microarchitectures Condition setting and checking blocks As we have discussed before, there are four condition bits in MPCDP, which are used to test whether a branch is taken. The condition setting block, which is labelled CHECKCC in Figure 3.12, sets the four condition code bits according to the logic equations described in section. As we can see from these equations, the carry-out bit is determined by the most-significant bit of the 33-bit result register; the z bit is the NOR of all of the bits in the result except the carry-out bit; n is the sign bit of the result; v is determined by the sign bits of two ALU operators and the result. The ports op1 and op2 are connected with two ALU operators separately. Port result is connected with ALU output. Ports add, sub and cmp are connected to the decoder output ports. The output ports of the CHECKCC block are four conditions bits: carry-out bit, zero bit, negative bit and overflow bit. While the SETCC block sets the condition code bits, the CHECKCC block is used to test these bits based on the condition specified in a conditional branch instruction. The input ports to this block are four condition bits and the 4-bit condition field of the instruction. The condition field determines what branch condition needs to be tested. The output port is set to be 1 when the selected branch is true and the branch will be taken. Otherwise the output is set to be zero and the branch will not be taken and the instruction following the branch instruction will be executed.
59
3.4 Microarchitectures Figure 3.12: Condition testing block: CHECKCC Register file As we mentioned before, both the operands and the result are stored in a triple-ported register file with 32 general-purpose registers. This register file permits two simultaneous reads along with a write operation. Ports R1, R2, R3 are two read ports and one write port. The addresses of two read ports are specified by a1 a2 and the write port address is specified by a3. It should be noted that the input signal of write port R3 is from Z5 register in the fifth pipeline stage. reg_write is the write-enable port. To address one of 32 registers, a 5-bit address is required.
Figure 3.13: Register file Sign extension block Some instructions include an immediate field which need to be sign-extended to be 32-bit to fit our 32-bit data path. The instructions having an immediate field are ALU instructions, jump instructions, branch instructions, store and load instructions. The immediate number in their immediate fields is treated as a signed number. Thus the sign extension block simply fills the most significant bits with the sign bit.
60
3.4 Microarchitectures The instruction is input to the IR port. Ports alu, jmp, br,st, ld, ldi, stx,ldx are connected the decoder. Port out outputs the sign-extended 32-bit number.
Figure 3.14: Sign extension block Multiplier The multiply operation for the control system processor has two data types: the multiplicand is a state variable which is in a 32-bit, fixed-point format; and the multiplier is an 8-bit mantissa, 8-bit signed exponent coefficient in floating-point format. The conventional multiply operation in Verilog is unable to multiply these two different data types due to the mixed format, and hence a special multiply operation is required to implement this operation successfully. Firstly, the 32-bit fixed point number is multiplied by the 7-bit mantissa. They are converted to their absolute values, resulting in a 31-bit unsigned number and a 6-bit unsigned number. Then, the multiplication can be conducted by the Verilog’s multiplier, resulting in a 37-bit unsigned fixed point number. This 37-bit binary value is then shifted towards the left or the right according to the value of the 7-bit exponent. When the exponent is positive, the variable is shifted to the left by the number of bits specified by the value of the exponent. When the exponent is negative, the variable is shifted to the right. Then the rightmost 6 bits are abandoned, resulting in a 31-bit unsigned number.
61
3.5 Pipelining Next, the rightmost 6-bits are eliminated to produce a 31-bit unsigned number. Finally, the sign bit needs to be added. The sign of the final result is determined by the leftmost sign bits of two original operands. If they have the same sign bits, the final result should be positive and a zero is added its sign bit. If two original operands have different signs, one zero should be added at the sign bit and then the 32-bit result should be negated. The final result is still a 32-bit signed number with a 16-bits fractional part. Due to the complexity of the multiply operation, it is the slowest arithmetic operation in the control system processor, occupying the largest amount of silicon area and consuming the most power. The Verilog code for the multiplier block can be found in Appendix B.
3.5. Pipelining 3.5.1. Overview of the pipeline structure In the simplest implementation of a processor, the complete fetch-execute cycle would be completed for one instruction before the next one begins. To speed up the execution of instructions, we break the fetch-execute cycle into five simpler suboperations and execute them in the assembly line. This assembly line processing is called pipelining. In our MPCDP, one instruction is executed in the following five-stage pipeline: 1. 2. 3. 4. 5.
Instruction fetch stage (IF). Instruction decode/register fetch stage (ID). Execution stage (EX). Memory access stage (MEM). Write-back stage (WB).
Pipelined hazards 62
3.5 Pipelining There are situations in pipelining when the next instruction cannot execute in the following clock cycle. These events are called hazards, and there are three different types: structural hazards, data hazards and control hazards. In our processor, by construction, we ensure that structural hazards are not an issue, and therefore we ignore them in this section. Data hazards occur when the pipeline must be stalled because one step must wait for another to complete. For example, suppose we have an add instruction followed immediately by a subtract instruction that uses the register 1 in register file. The add instruction does not write its result until the fifth stage, meaning that we would have to waste three clock cycles in the pipeline. The solution to data hazards is adding hardware to retrieve the missing item early from the internal resources, which is called forwarding. One hazard detection block may be achieved by examining the instruction set and determining: 1. The earliest pipeline stage at which the results of an instruction are available 2. The latest stage at which the contents of a register are used in the pipeline Suppose instruction I1 is immediately followed by I2. Normally, I2 lags I1 by one stage. If I1 and I2 have data dependence but the result of I1 is ready just one clock cycle before it needs to be used in I1, the data hazard can be removed by forwarding. However, forwarding cannot prevent all pipeline stalls. If the data being loaded by a load instruction has not yet become available when it is needed by another instruction, forwarding cannot prevent the data hazard. Instead, the pipeline must be stalled. Pipeline stall is also used to deal with control hazards or branch hazards. The control hazards happen because we do not know whether the branch will be taken or not until the branch condition is tested. One solution for condition hazards is that 63
3.5 Pipelining always assume that the branch condition is taken and execute the instruction following the branch instruction. And if the conditional branch at the end turns out to be untaken, the instruction following the branch instruction should be flushed and the target instruction specified by the branch instruction will be fetched and executed. In conclusion, the hazard detection is designed to check for potential hazards, and to determine when the pipeline is to be stalled, when bubbles are to be introduced, and when data must be forward in each stage of the pipeline (Lilja and Sapatnekar 2005; Patterson and Hennessy 2008).
Stage 1: Instruction fetch stage (IF) IR2 PC2 Stage 2: Instruction decode/register fetch stage IR3
PC3
33
3
Cond
X3 (ID)
Y3
MD3 33
Stage 3: Execution stage (EX) IR4
PC4
4
4
Cond
CC4
Z4
MD4
4
4
Stage 4: Memory access stage (MEM) IR5
Z5
5
Stage 5: Write-back stage (WB) Figure 3.15: An overview of the structure of the pipeline, showing the stages, registers, and forwarded signals.
64
3.5 Pipelining Figure 3.15 illustrates the five stages of the pipeline, the intermediate registers, and the forwarded data. Notationally, the numeral at the end of each register indicates the stage number of the pipeline that it feeds. The IF stage propagates the instruction and program counter to the ID stage through the IR2 and PC2 registers, respectively. From the ID stage to the EX stage, a larger number of intermediate registers is introduced. We propagate the instruction and program counter, this time through IR3 and PC3. As we will see later, this includes mechanisms to handle pipeline stalls and bubbles. Notably, any pipeline stalls are performed at this stage. In addition, we also propagate
a condition bit that is set to one if the instruction in the ID stage is a branch instruction, and if the branch condition holds
the X3 and Y3 registers that feed the ALU inputs
the MD3 register that carries data to be written to the memory
During memory write instructions, the ALU inputs may be used for address computation, and hence the data to be written is stored in the MD3 register. The EX stage propagates the condition bit from the ID stage, and also has an IR4 and PC4 for the instruction register and the program counter, respectively. As before, these may be configured for introducing pipeline bubbles, but configuration for pipeline stalls is not necessary since these are processed in the ID stage. Besides these registers, the memory write data is stored in the MD4 register, the ALU results are stored in the Z4 register, and the condition codes in the CC4 register. Finally, the MEM stage only propagates the instruction register to IR5, and the register write data to the Z5 register. As shown in the figure, data forwarding is implemented by feeding back the Z4, Z5 and CC4 bits to various pipeline stages. 65
3.5 Pipelining The data read from memory goes into the Z5 register. Note that since Z5 is used as a forwarding register, it may also take other inputs in case the instruction that is being executed does not involve a memory read or write. Specifically, an input MUX into Z5 chooses one among the data from memory, the contents of PC4, and the contents of Z4. The final stage executes any writes that must be performed in the register file. Specifically, if the destination in an instruction is a register, then a register address and data are sent to the register file, along with a write signal. 3.5.2. A detailed description of the pipeline stages The instruction fetch (IF) stage The address of the current instruction is stored in the program counter (PC). During normal operation, the instruction fetch stage addresses the instruction memory using the program counter, so that the current instruction may be brought in and load into the instruction register (IR). The program counter is then updated to point to the next instruction to be introduced into the pipeline. When the pipeline is stalled, the next instruction in the program is placed on hold, and NOP operations are introduced into the pipeline. A block-level diagram of the circuitry that implements this stage is shown in Figure 3.16. The PC is updated through a MUX. The IF stage communicates with the ID stage by updating the IR2 and PC2 registers, respectively. The former is updated by a MUX that passes through either the instruction from the instruction memory, or a NOP, or recirculates the current value of IR2 in case of a pipeline stall. The latter is updated through another MUX that passes either PC+4 or retains the current value of PC2, depending on whether the pipeline is to be stalled or not.
66
3.5 Pipelining
Figure 3.16: The IF stage of the instruction pipeline. The instruction decode (ID) stage In stage 2, shown in Figure 3.17, the contents of the instruction register and the program counter are passed on to stage 3 through the IR3 and PC3 registers, respectively. The value of IR3 is chosen by a MUX to be either the value of IR2, a NOP instruction (in case of a pipeline flush), or a recirculation of the present value of IR3 (in case of a stall). The contents of PC3 are either taken from PC2 or by recirculating the current value (in case of a pipeline stall). The registers X3 and Y3 are loaded with the operands to the ALU for the EX stage. The first operand, X3, is obtained by addressing the register file to obtain the contents of rs1, or from the program counter (in the case of an address computation for an indexed instruction), or from a forwarded output, Z5, from stage 5, or by recirculating the current status of the register in case of a stall, as depicted by the MUX that feeds X3.
67
3.5 Pipelining Similarly, the second operand, Y3, is chosen from among the contents of rs2, an immediate operand, Z5, or the recirculated value of Y3. The immediate address is less than 32 bits, and must pass through sign-extension circuitry to create a 32-bit operand. Finally, for store instructions, the operand to be stored is placed in the MD3 register. The MUX input to MD3 selects one of rs2 or Z5 in normal operation; additionally, in case of a pipeline stall, the value of MD3 is retained.
Figure 3.17: The ID stage of the instruction pipeline. The execute stage The EX stage performs arithmetic computations, and sets the condition code bits appropriately. The structure of this stage is shown in Figure 3.18. As before, the contents of the IR and PC and transmitted to the next stage, this time through the IR4 and PC4 registers. The IR4 register may be fed by either the IR3 register, or with a NOP in case the pipeline is to be flushed, while the PC4 register simply transmits the contents of PC3.
68
3.5 Pipelining The ALU outputs are transmitted to the next stage through the Z4 register; in addition, this output goes to a condition flag generating circuit that transmits the condition codes to the next stage through the CC register. The inputs to the ALU, X and Y, are selected by a pair of MUXs. The MUX input to X may choose either X3, the forwarded operand Z5, PC4 or Z4. The MUX input to Y is similar, except that Y3 is used instead of X3.
Figure 3.18: The EX stage of the instruction pipeline. The memory access (MEM) and write back (WB) stages The last two stages are illustrated in Figure 3.20. The MEM stage performs the task of read/write operations from and to the memory. Also, the I/O task is also performed in this stage. The address is taken from the Z4 register, and read (Fortescue, Stark et al.) and write (WR) signals are generated and sent to memory: if RD is one, then the memory access is a read, and if WR is one, then the access is a write. If both signals are zero, no memory transactions take place in this stage. The data to be written may either be 69
3.5 Pipelining taken from the MD4 register, or from the Z5 register (in case of data forwarding). The data read from memory goes into the Z5 register. Note that since Z5 is used as a forwarding register, it may also take other inputs in case the instruction that is being executed does not involve a memory read or write. Specifically, an input MUX into Z5 chooses one among the data from memory, the contents of PC4, and the contents of Z4. The address is taken from the Z4 register and sent to the data memory and I/O block. If the instruction is LD, LDI or LDX the I/O block will read the specified input port and send the input signal to a MUX through d_in port. At the meantime, the memory will also send a data signal to the MUX. The data read from memory or I/O block goes into the Z5 register. Note that since Z5 is used as a forwarding register, it may also take other inputs in case the instruction that is being executed does not involve a memory read or write. Specifically, an input MUX into Z5 chooses one among the data from memory, I/O block, the contents of PC4, and the contents of Z4. If the instruction is ST, the data will be sent to d_out port of the I/O block and the write data port of the data memory. For I/O block, if the address specified in the instruction is within the range of 400000 to 400126, the value of one output port will be changed. Otherwise the values of output ports remain unchanged. For the memory block the data to be written may either be taken from the MD4 register, or from the Z5 register (in case of data forwarding). One problem here is the data will be written in data memory even if the address if within the range of 400000 to 400126.But since the address from 400000 to 400126 is reserved for the I/O port, no useful data is stored in the data memory within this address range.
70
3.6 MPCDP compiler and assembler
Figure 3.19: The MEM and WB stages of the instruction pipeline. The final stage executes any writes that must be performed in the register file. Specifically, if the destination in an instruction is a register, then a register address and data are sent to the register file, along with a write signal.
3.6. MPCDP compiler and assembler The MPC algorithm is firstly written in M-file and tested in Matlab and Simulink environment. After the algorithm is verified, it is written in C. In this section, the C-like programming language designed for MPCDP is presented. To shorten the testing period and avoid errors in hand-writing binary codes, a compiler and an assembly are designed for MPCDP. 3.6.1 MPC assembler MPCDP needs to read the executable binary code to it memory before the instructions are actually executed. In the last section, we explained how C code can be translated into assembly code by compiler. In this section, we will describe how the assembly code is translated to processor’s machine language. One piece of generated machine code is shown in Figure 3.20: assembly code example . All numbers are in hex 71
3.6 MPCDP compiler and assembler format. The address of each instruction is shown in each line using the Verilog command @. Following the address is the 32-bit instruction code. The object code is ready to be read by an appropriate Verilog $readmemh statement. Once the code is read into the memory, the instructions can be executed by MPCDP. @0008 5f 80 08 00 @000c 5f 08 00 00 @0010 5f 40 0a 00 @0014 58 00 00 08 @0018 4f c1 00 18 @001c f8 00 00 00 @0020 09 39 00 00 @0024 0f 39 00 10 Figure 3.20: assembly code example It should be noted that the system statement $readmemh statement cannot be synthesised by ISE synthesis tool. But it is quite useful for simulation in Simulink. For hardware implementation, the executable code needs to be read into FPGA’s block memory before MPCDP starts running. 3.6.2 Compiler The MPCDP compiler is used to translate the source code of a program written C into the assembly language of the processor on which the program is executed. The processor’s assembly language is called the target language. The generated target language will be translated to final executable object code. Of course, it is possible for the compiler to generate the binary machine language directly from the C source code. 72
3.6 MPCDP compiler and assembler But it is much easier for the compiler to translate from the C code to assembly code rather than binary code. Besides, sometimes we want to make some changes on the generated assembly code and use the modified assembly code to generate binary code. Consequently, it is more convenient to separate the complier from the assembler. The reasons why we decide to have an independent compiler will be explained in detail in the following sections. Our compiler uses a very restricted but quite functional subset of the C programming language, which we will call MPCDP-C from this point on. It supports functions and arrays and arrays can be used as parameters for functions. The only number type supported by MPCDP-C is unsigned integer. However, as we mentioned before, the variables in MPCDP are signed 32-bit fixed point numbers with 16 bits fraction part and the coefficients are signed 21-bit floating point numbers. To input coefficients into our MPCDP-C source code, these floating point numbers have to be converted to integers. For example, we assume one coefficient is -2.5. Firstly, it needs to be converted to 21-bit floating point numbers in binary, that is
Both mantissa and exponent are 2’s component. Then, we need to connect the mantissa and exponent together, which gives 101100000000000000010 Then this binary number is converted to an unsigned integer number, that is
At the end, this number is input into to our MPCDP-C source code as one MPC coefficient. In MPCDP, this integer number will be decoded and recovered to the 21bit signed floating point format. 73
3.6 MPCDP compiler and assembler Table 3.5 lists all of the standard C keywords supported by MPCDP-C.
void
int
while
for
if
else
break
continue
return
Table 3.4: keywords supported by MPCDP-C 1
( ), [ ]
2
++, --
3
+, -
4
, =
5
==, !=
6
=, +=, -=, *=, /=
Table 3.5: The operators supported by MPCDP-C I/O operations MPCDP-C supports two simple input/output statements: getc ( ) and putc (). These two functions are built into the language, rather than being external functions in a library. The parameter for getc( ) is the input port number. Actually, our MPCDP has 16 input ports and 16 output ports, so the parameter for getc( ) is 0~15. For putc ( ), its parameters are the number to output and the out and the output port number. Sometimes we do not need the assembly code generate by the assembler. To get the binary object code directly, we can simply let our compiler to invoke the assembler to translate the assembly code to binary code. The temporary assembly code can be deleted or saved by the programmer. 74
3.7 Software architecture for control program
3.7. Software architecture for control program 3.7.1. Introduction to control loops in MPCDP As it has been mentioned before, since the MPCDP compiler and assembler can translate a subset of standard C code to executable machine code directly, we can focus on the C implementation of MPC algorithm. The validity of the algorithm itself is no longer an issue, because it has already been verified in Matlab and Simulink environment. Actually a control program can be written in many programming languages and implemented in different types of hardware; for example the C language is widely accepted by control engineers in embedded systems(Barr 1999). However, no matter what hardware and programming language a control engineer uses, the programming scheme of most control laws is quite similar. Figure 3.21 shows such a program flowchart, which can be summarised to three stages: initialisation, synchronisation and execution.
75
3.7 Software architecture for control program
Start Initialisation
Is there a reset
No
signal (a pulse)? Synchronisati YES
on Reset PC
Reset all registers Sampling data
Arithmetic operations
Writing output
Execution
Figure 3.21: Control program fowchart In the initialisation stage, all the initial states and the coefficients are loaded. If an internal clock is used to define the sampling clock, the counter value is loaded. In some high level languages, the start register and the stop register of the program counter don't have to be stated as these values will be automatically recognised by the loop instructions after compilation, for example the `while' and `for' commands in C. However, in assembly or assembly-like language, these registers must be pre-defined, being applicable to most ASIS-based languages. In MPCDP, an external sampling clock is used to determine the sampling frequency. This sampling clock is a pulse generator. At the beginning of every control loop, it generates a pulse to reset the MPCDP, especially setting the PC to zero. In the 76
3.7 Software architecture for control program synchronisation stage, the hardware that runs the control program must be synchronised with other peripherals, especially the data acquisition card (DAC). For most control programs, a sampling clock is used for synchronisation, which is realised through a handshaking mechanism. At every rising edge of the sampling clock, the program sends a `request' signal to a peripheral. The peripheral corresponds with an `acknowledge' signal. Then the data are passed from the peripheral to the I/O registers, being ready for the main control operations. Before the beginning of the control loop or after the end of the main control operations, the program idles till the next pulse from the sampling clock. In the final stage, the program executes all the calculations along with necessary data exchanges with the I/O registers. In the diagram, sampling data happens at the very beginning and writing output does at the end of the control loop. 3.7.2. Simulation examples This example is used to show the basic C code and assembly code structure for the MPC algorithm on the MPCDP. For simplicity, one SISO system example is shown here. This system is one of three channels of a satellite attitude dynamic model. Figure 3.22 shows the variables declaration of arrays and variables, which is very similar to standard C. As we mentioned before, all coefficients in MPCDP are in the 21-bit floating point format, represented by an integer in MPCDP-C code. That is why all coefficients in this piece of code are very large integer numbers. As we can see here, the parameters for MPC algorithm are initialized in an array. These numbers will be stored in the memory of the CPU in the initialization stage.
77
3.7 Software architecture for control program
Figure 3.22: MPCDP-C code for initialisation stage Figure 3.23 presents the code for I/O operation. I/O function getc() is used here to read the corresponding I/O ports. Then four of input values are stored in an array Xf[4] and the last input value is stored in variable u_old. It should be noted here that although in this example a SISO system is used, five I/O input ports are needed in MPCDP because in MPC algorithm augment state space model is used so totally five feedback variables are input to the controller, i.e. the MPCDP.
Figure 3.23: MPCDP-C code for I/O operation Figure 3.24 shows the code for execution stage. It is worth noting that in each multiplication operation, the multiplier is a 21-bit floating point number, which has been initialized in the initialization stage, and the multiplicand is the 32-bit fixed point number, which is read from one I/O port. Except multiplication, the operands of all arithmetic operations are 32-bit fixed point numbers. Function Putc() is used to output control signal delta_u to the system. For this SISO system, only one I/O output port is needed. This MPCDP-C code can be translated to corresponding assembly code and binary machine code. 78
3.8 Summary
Figure 3.24: MPCDP-C code for execution stage and I/O output
3.8. Summary In this chapter we discuss the processor design for MPC algorithm implementation. The instruction set, hardware architecture are designed according to the requirements of MPC algorithm. All variables in MPCDE are expressed in 32-bit fixed point number while coefficients are in 21-bit floating point system. A unique multiplier takes one coefficients and one variable as its operands to generate a 32-bit fixed point product, thus ensuring a large dynamic range and a relatively simple hardware structure. One C-like programming language is defined. For most applications, programs can be written in that C-like language and converted to executable binary codes by using the MPCDE compiler. Also, the MPCDE assembler makes it possible 79
3.8 Summary to write assemble language directly to accurately control the hardware usages. MPCDP is designed for MPC algorithm but is also capable of implementing most of other control algorithms.
80
Chapter 4. Simulation In Chapter 3, a five-stage pipeline processor MPCDP has been developed for our control task. In this chapter we test the performance of MPCDP on our satellite dynamic model. A traditional way to test an embedded processor’s performance is to conduct a Hardware-in-the-loop (HIL) simulation. In Section 4.1, the Hardware-inthe-loop simulation is reviewed. In order to shorten the testing period of HIL simulation, in section 4.2 a novel verification methodology based on EDA Simulator Link tool is developed. Section 4.3 presents two simulations on one-channel and three-channel satellites respectively, using the verification methodology introduced in section 4.2. Section 4.4 presents the synthesis results for MPCDP. Both the frequency and the power consumption are analyzed. One comparison is made between MPCDP and another embedded processor called CSP in terms of resources occupied, power consumption and frequency.
4.1. Hardware-in-the-loop simulation Our object of designing the MPCDP is to apply the MPC control algorithm on a three channel satellite model. Before the MPCDP is integrated in the real physical system, each individual part of the MPCDP and the performance of MPCDP itself need to be tested. The verification and simulation process thus consist of three parts: individual component verification, MPCDP verification, and Hardware-in-the-loop simulation. This process can be illustrated in Figure 4.1. 81
4.1 Hardware-in-the-loop simulation
Verilog test bench Individual component verification
Binary machine code self-test program MPCDP verification
Binary machine code test with C physical model Hardware-in-the-loop simulation
Figure 4.1: MPCDP verification flowchart Normally, a test bench, which is a program written in Verilog, is needed to stimulate the inputs of Verilog hardware module and compare the outputs produced to the value expected. The goal of this verification process is to ensure that each individual component of MPCDP behaves as specified in the ISA under all possible conditions and all possible input combinations. If tests for all components are successful, we can start verifying the performance of a processor as a whole.
Here an idea called self-test programs needs to be
introduced. These tests are written in the processor’s assembly language, or in a highlevel programming language that is compiled into the processor’s assembly language. When the self-test program is executed, it verifies that the different operations it performs are actually being executed correctly. This testing technique is particularly 82
4.1 Hardware-in-the-loop simulation useful for verifying the operation of the entire processor as each individual instruction is executed. In Hardware-in-the-loop simulation, the Verilog codes for a processor are synthesized by ISE synthesis tool (Xilinx) and downloaded to a FPGA chip. The physical system is simulated by C language or in Simulink environment on a personal computer. For
Hardware-in-the-loop
simulation,
the
most
difficult
thing
is
the
communication between the FPGA hardware and the host computer, in which a C programming is running to simulate the real physical system. The Hardware-in-theloop simulation scheme is shown in Figure 4.2.
Serial
Physical System (Host PC)
communication
Serial communication
Xilinx FPGA (MPC controller)
Figure 4.2: Hardware-in-the-loop simulation scheme The output from a physical system (Host PC) is fed into the CPU through a serial communication. Then the CPU generates a digital control signal, feeding into the physical system. Here the physical system can be replaced by a C program on the host computer. To make the system user-friendly, a GUI is also needed to change the simulation parameters and recording the simulation results.
83
4.2 EDA Simulator Link tool
4.2. EDA Simulator Link tool In section 4.1 the traditional process of testing the performance of a processor has been introduced. As it can be seen, in the first step, to verify each individual part, Verilog test benches are needed. On one hand, writing test benches for all CPU parts is quite a time consuming task. On the other hand, the result of test bench verification is not quite straightforward. Since only the waveform can be displayed in Modelsim, it is not easy to compare the result of the verification with the result that is expected. An alternative solution is to use EDA Simulator Link to realize the Hardware-in-theloop simulation, which is used in our design. EDA Simulator Link integrates MATLAB and Simulink with hardware design flows. It supports verification of FPGA and ASIC hardware designs by providing a cosimulation interface to HDL simulators, virtual platform development by generating a SystemC Transaction Level Model (TLM 2.0) component and standalone test bench and FPGA deployment of automatically generated HDL by creating and managing Xilinx ISE projects . HDL Simulator Server
Request
Simulink Client
Response
(Modelsim)
Out
In In Simulator Figure 4.3: Linking with Simulink and the HDL The HDL simulator responds to simulation requests it receives from cosimulation Out blocks in a Simulink model. After a cosimulation session starts, the simulation progress and results can be monitored from Simulink and HDL simulator, i.e.
84
4.2 EDA Simulator Link tool Modelsim. For example, signals can be added to Modelsim Wave window to monitor simulation timing diagrams. In MPCDP design, EDA Simulator Link is used for the module-level verifications and the Hardware-in-the-loop simulation. The detailed technical description of EDA Simulator Link can be found in Mathworks website. Since the module-level verification is relatively simple, here we only describe how this tool works in our Hardware-in-the-loop simulation. The block diagram of Hardware-in-the-loop simulation is shown in Figure 4.4. As it can be seen, there are five input signals for MPCDP and only one output signal. For this specific satellite model, there are two state variables and two outputs. y(n)-r is the difference between the setpoint and the plant output. Δx(n) is the increment of the state variable. The control signal at last sample time is also input the MPCDP to calculate the new control signal, by using Δ The output of MPCDP is the control signal for the satellite, i.e. the torque, rather than the increment of control signal. Setpoint r y(n)-
Satellite
MPCDP
rrom
u(n)
enable reset
dynamic model
Δ
+
y(n)
y(n)-r
x(n)
Δ +
x(n)u(n-1)
z
-
-
x(n-1) x(n)
z-1
-1
Figure 4.4: Co-simulation block diagram 85
4.3 Satellite attitude control simulations
4.3. Satellite attitude control simulations Here two Simulink and Modelsim co-simulation examples are given. They are used to verify the performance of MPCDP on satellite attitude control. Two MPCDPs with different numbers of I/O ports are designed. MPCDP-1 has five 32-bit input ports for feedback signals, one reset port, one RAM enable port and only one 32-bit output port. MPCDP-2 has 16 32-bit ports for feedback signals, one reset port, one RAM enable port and 16 32-bit output ports. For simplicity, the Hardware-in-the-loop simulation is first carried out on the one-channel satellite dynamic model and MPCDP-1. Thus only one output port is required in the MPCDP block. The MPCDP block in EDA Simulator Link is shown in figure. Then the result will be extended the three-channel satellite and MPCDP-2. Example A: one-channel satellite The MPCDP-1 block in EDA Simulator Link is shown in figure. The port names and
corresponding
input
and
output
signals
are
shown
Figure 4.5: MPCDP-1connection diagram
86
in
Table
4.1.
4.3 Satellite attitude control simulations port name
signal
I/O_in_0
y[0]-r[0]
I/O_in_1
y[1]-r[1]
I/O_in_2
Δ
I/O_in_3
Δ
I/O_in_4
u(n-1)
clr
reset sigal
rom_en
rom enable signal
Table 4.1: MPCDP-1 ports description 68.58
Moments of inertia about x (
Amplitudes constraints on actuators (
)
0.05
Sampling interval (s)
10
Prediction horizon
60
Scaling factor for three control signals
0.4
Number of terms for three control signals
1
Control weighting R
0.1
Table 4.2: One-channel satellite simulation parameters The satellite and MPC parameters are given in Table 4.2. Its discrete state space model is given by
87
4.3 Satellite attitude control simulations For N is chosen to be 1, only four parameters are involved in the online computation. These four parameters and their corresponding integers in MPCDP-C are shown in Table 4.3. original
-0.3262
-5.6518
-0.1719
-0.0249
1356419
1376382
1262075
parameters integers
in
1413247
MPCDP-C
Table 4.3: Number conversion in Example A. The Matlab simulation results are given in figure and the corresponding Hardware-inthe-loop results are given in figure
Figure 4.6: Example A: one-channel satellite attitude control (angle). Top plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton.
88
4.3 Satellite attitude control simulations
Figure 4.7: Example A: one-channel satellite attitude control (angular rate). Top plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton.
Figure 4.8: Example A: one-channel satellite attitude control (Torque). Top plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton. 89
4.3 Satellite attitude control simulations It can be seen from Figure 4.6, Figure 4.7, and Figure 4.8, the Hardware-in-the-loop simulation results are almost identical to the Matlab simulation results. The control signal, i.e. the torque, in both simulations, is limited between -0.05 to 0.05.
Figure 4.9: RTL simulation result for one-channel satellite In Figure 4.9 the RTL simulation result is shown. I/O_in_0 to I/O_in_4 are five input signals. I/O_out is the output signal. clr is reset signal. The period of a control loop is set to be 20000ns and only a short period of wave form can be shown in this graph. As a result in this picture clr is always zero in this short period. Actually, the clr should become one at the end of a control loop to reset the PC. clk is the clock signal and its period is arbitrarily set to be 200ns. hlt5_found is used to indicated the instant when the MPCDP generates the new control signal. After that, the control signal will be hold be a zero-order holder until a new signal is generated. The other signals shown in the wave window are all internal signals in MPCDE, e.g. IR2 is the instruction register on stage two. 90
4.3 Satellite attitude control simulations Example B: three-channel satellite In order to test the capabilities of MPCDP in performing on MIMO system, another Hardware-in-the-loop simulation is performed on a three-channel satellite. In this simulation MPCDP-2 is used, which has 16 inputs and 16 outputs. This simulation also verifies the MPCDP’s capability in handling large matrix operations.
The
satellite dynamic parameters and MPC parameters are summarized in Table 4.4. 68.58e, 78.58, 81.64
Moments of inertia about x, y, z ( Amplitudes constraints on actuators (
3e-6, 3e-6, 3e-6
)
Sampling interval (s)
10
Prediction horizon
60
Scaling factor a for three control signals
0.4,0.4,0.4
Number of terms N for three control signals
1,1,1
Control weighting R
0.1,0.1,0.06,
Table 4.4: Three-channel satellite simulation parameters Its discrete state space model is
u
91
4.3 Satellite attitude control simulations
Figure 4.10: The simulation block diagram in Simulink (MPCDP-2) In this simulation, no constraints are implemented on the control signals. 0.3 0.2 0.1 0 -0.1
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
0.6 0.4 0.2 0 -0.2
0.15 0.1 0.05 0 -0.05
Time (mintues)
Figure 4.11: Example B: three-channel satellite attitude control (angle). Left plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton.
92
4.4 Synthesis reports -3
5
x 10
0
-5
-10
0
10
20
30
40
50
60
70
10
20
30
40
50
60
70
10
20
30
40
50
60
70
-3
5
x 10
0 -5
-10
0 -3
5
x 10
0
-5
0
Time (mintues)
Figure 4.12: Example B: three-channel satellite attitude control (angular rate). Left plot: Matlab simulation; bottom plot: Simulink and Modelsim co-simulaiton.
4.4. Synthesis reports MPCDP’s verilog codes are synthesized for the implementation on a FPGA board. Synthesizing the CSP (SAM 2010) is performed using Xilinx ISE Design Suite 10.1, targeting at the Virtex4 FX 20 FPGA board (Xilinx 2008). The synthesis report for MPCDP-1 is given in Figure 4.13. It can be seen from the report that 1581 slices out of a possible 8544 are used, which is an occupancy percentage of 18%. The synthesis report also shows that the MPCDP is able to attain a maximum frequency of 256.14MHz at a minimum period of 3.904ns, which would be sufficient to implement most control processes.
93
4.4 Synthesis reports
Figure 4.13 Synthesis report for MPCDP-1 The power requirements are analysed using the XPower Analyser tool in Xilinx ISE Design Suite as shown in Figure 4.14.The total power consumption is 0.359 W, which is acceptable for a small satellite.
Figure 4.14: Power Requirements Analysis for the MPCDP-1
94
4.4 Synthesis reports In order to compare the speed and the power consumption of MPCDP with other processors, one 4-clock cycle state machine processor called CSP (SAM 2010) is synthesized with the same tool and same FPGA board. CSP does not adopt a pipeline structure and it requires 4 clock cycles to complete one instruction. Since MPCDP-1 employs a five-stage pipeline structure, it actually completes one instruction in only one clock cycle. The synthesis report and power consumption report for CSP are shown in Figure 4.15 and Figure 4.16 respectively. CSP uses 688 slices out of 8544, which occupies 8% of the all slices available. Its maximum frequency is 40.256 MHz at a period of 24.841 ns. Its power consumption is 0.306 W.
Figure 4.15: Synthesis report for CSP based on 4-clock cycle state machine
95
4.5 Summary
Figure 4.16Power Requirements Analysis for the CSP Obviously, MPCDP’ frequency is more than 6 times higher than CSP. It is worth noting that MPCDP requires only one clock cycle to complete one instruction which CSP requires four, which means that efficiency of MPCDP is more than 24 times higher than CSP. On the other hand, MPCDP uses more than 2 times slices than CSP. The power consumptions of two processors are quite close. In order to compare the overall performances of two processors, we define a processor performance coefficient (PPC) in (4.1).
PPC=power consumption × minimum period× percentage of slices occupied × number of clocks for one instruction
(4.1)
As PPC takes both the power consumption and frequency of a processor into account, a processor with a lower PPC is considered to be able to perform more complex algorithm with less power consumption, which is a key requirement of our MPC controller. The PPC of MPCDP-1 is 0.2523 while the PPC of CSP is 2.4324.
4.5. Summary In this chapter, the control performance of MPCDP is verified by using Simulink and Modelsim co-simulation. The simulation results show that MPCDP has sound 96
4.5 Summary performance on both SISO and MIMO systems. The novel verification method can be used to replace traditional Hardware-in-the-loop simulation to shorten the processor testing period. The synthesis report shows that MPCDP obtains a relatively high frequency with a low power consumption and low percentage of slices occupied. The overall performance of MPCDP is quantified by a coefficient called PPC. Calculation results show that MPCDP has of PPC of 0.2523 while CSP’s PPC is 2.4324. The comparison with CSP further demonstrates the superior performance of MPCDP.
97
5.1 Conclusions
Chapter 5. Conclusions and Future Work This chapter reviews the thesis and concludes. Future research directions are also discussed.
5.1. Conclusions 5.1.1. Review of the thesis This thesis presents two areas of research work: Laguerre MPC algorithm for realtime control and the customized hardware for applying MPC algorithm in real-time system. In chapter 2 the dynamic models of a small satellite and the earth’ magnetic field are successful developed. Laguerre MPC are analysed to reduce the MPC’s computational demand. In chapter 3, a MPC dedicated processor MPCDP is developed. In chapter 4, the MPCDP’s performance in satellite attitude control is verified by using Simulink and Modelsim Co-simulation. 5.1.2. Achievements The major achievement of this research is successfully applying MPC algorithm in real-time systems. This achievement includes two parts: the simplification of MPC algorithm and the customized hardware design for MPC. To simplify MPC algorithm, We adopt Wang’s idea (Wang 2009) of using Laguerre Functions to rebuild the MPC structure to reduce its coefficients involved in on-line computation. In a motor control
98
5.1 Conclusions simulation in chapter 2, compared with traditional MPC, Laguerre MPC uses onetenth coefficients but gain the same control performance. In order to increase the efficiency of implementing MPC algorithm in hardware, a MPC dedicated processor (MPCDP) is developed on a Xilinx FPGA board. To the best of author’s knowledge, it is the first time that Laguerre MPC is implemented on an FPGA based embedded processor. A five-stage pipeline structure processor is designed and tailored according to the requirements of Laguerre MPC algorithm. A subset of standard C is defined as the programming language of MPCDP. A compiler and an assembler can convert the C codes to binary executable codes for MPCDP. To test the performance of MPCDP on our satellite model, a novel verification methodology based on EDA Simulator Link tool is developed. Compared with traditional HIL simulation, this novel method can greatly shorten the testing period and reduce the components involved in simulation, such as A/D and D/A converters. From the final simulation results, it can be seen that MPCDP is capable of handling the attitude control for both one-channel and three-channel satellites. A comparison made between CSP and MPCDP-1 in terms of power consumption, frequency and slices occupied shows the feasibility and superiority of MPCDP design. 5.1.3. Limitations Both the Laguerre MPC algorithm and its hardware support have been proven practicable for real-time control (satellite control in our case). But here MPC algorithm is based on linearized dynamic model. It is still unknown whether Laguerre MPC is applicable on non-linear systems. In Chapter 3, we have introduced a dedicated processor architecture for implementing the MPC algorithm. And Chapter 4 shows that the MPCDP has been 99
5.2 Future work successfully synthesized, targeting the Xilinx Virtex-4 FPGA. However, this is only realized with the help of some EDA tools. Although the processor is verified by using Matlab EDA Simulator Link tool and Modelsim co-simulation, it has never been tested in its real circuit. The co-simulation provides a way to validate the processor’s practicability in a virtual control environment, but the MPCDP has not been verified in real control environments due to the lack of real physical systems.
5.2. Future work Although MPCDP is designed to control the attitude a small satellite, duo to the cost of launching a satellite to low earth orbit, in the foreseeable future this experiment may not be conducted. Instead, some applications in unmanned vehicles (UAVs) are much easier to be achieved. For example, MPCDP can be applied in stabilizing the flight of a quadrotor, which is a miniature aircraft lifted and propelled by four rotors. This is a form of vertical-take-off-and-landing (VTOL) aircraft, employing four fixedpitch blades, and the motion of the quadrotor is controlled by varying the speed of each rotor. Quadrotors are generally used as unmanned aerial vehicles (UAVs) for surveillance purposes and require the use of an electronic control system to control its flight. A quadrotor is currently being built in the University of Sydney laboratory and would be an ideal application to test the operation of the MPCDP. Other forms of physical applications would be a Maglev controller which takes input in the form of an air-gap control signal and outputs 4 signals which are air-gap, acceleration, flux and current for the control of the a maglev vehicle. A physical system would not only be able to verify the operation of the CSP but would also be able to find the limits of the controller in terms of the complexity of the system implemented and the degree of order in the control law implemented. 100
5.2 Future work Besides, although MPCDP has been compared with CSP in terms of frequency and power consumption, it has not been benchmarked against other processors available on the market, such as DSPs, general-purpose processors and processors implemented on desktop computers and workstations. Hence future variants of MPCDP should be tested against other processors to ensure that it is competitive in performance.
101
APPENDIX A The MPCDP instruction set architecture (ISA) ADD – addition The ADD instruction adds two source operands and stores the result in the destination register. One of the operands is always taken from the register pointed by ro1. If the sixteenth bit is zero, the second operand is taken from the register pointed to by ro2. Otherwise the second operand is the sign-extended 32-bit immediate filed. Bxx- conditional branch If the specified condition evaluates to be true, this instruction will branch to the target address. Otherwise the instruction following the branch is executed. The target address is the sum of the current PC value and the sign-extended value of the immediate field. To evaluate the condition, the CHECKCC block checks the condition code bits and forms the corresponding logical expression shown in the above table. These condition codes are set by the SETCC block. CMP- compare The CMP instruction is used to set the condition code bits by subtracting the second operand from the first operand. So the result itself is not stored. One of the operand is always taken from the register pointed to by ro1. If the sixteenth bit of the instruction is zero, the second operand is taken from the register pointed to by ro2. Otherwise the second operand is the sign-extended 32-bit immediate field. 102
HLT- Halt Stop execution JMP- Jump indirect+ offset JMPL- Jump and link indirect + offset Unconditionally jump to the address specified by the sum of the contents of register rs1 and the sign-extended immediate field. Is the sixteenth bit is one, the current value of PC will be stored in register rd. LD- Load direct Read the memory location specified by the sign-extended immediate field and store the result the register rd. LDI – Load immediate Read the sign-extended immediate field into register rd. LDX – Load indexed Read the memory location specified by adding the contents of register ro1 and the sign-extended immediate field. Store this value in register rd. MOV- move register to register MOV instruction will be converted to ADD instruction ADD rd, ro1, #0
by
assembler. NOP – No-operation Perform no-operation, except for incrementing the PC to point to the next instruction to execute. ST- store direct Store register rd1 into the memory location specified by the sign-extended immediate field. 103
STX- store indexed Store the value in register rd into the memory location specified by adding the contents of register ro1 and the sign-extended immediate field. SUB- subtraction The SUB instruction subtracts the two source operands and stores the result in the destination register specified by rd. The first operand is always taken from the register pointed to by ro1. If the sixteenth bit is zero, the second operand is taken from the register pointed to by ro2. Otherwise the second operand is the value of sign-extended immediate field.
104
APPENDIX B Verilog HDL Code for Multiplier task Multiply; //(Mplr_m,Mplr_e,Mcnd_l,Mcnd_h,Product); /******************************************** This module is a 32-bit multiplier. The multiplicant is a 32-bit signed fixed point number with 16-bit fraction. The multiplier is 32-bit signed floating point number with 16-bit exponent.
*********************************************/
input [13:0] Mplr_m; input [6:0] Mplr_e; input [15:0] Mcnd_l; input [15:0] Mcnd_h; output [31:0] Product; reg [13:0] temp_Mplr_m; reg [6:0] temp_Mplr_e; reg [31:0] Mcnd; reg [31:0] temp_Mcnd; reg [31:0] Product; reg [31:0] abs_Mcnd; reg [45:0] abs_Product; 105
reg [45:0] shift_Product; reg [45:0] real_product; reg [13:0] abs_Mplr_m; //always @(Mplr_m or Mplr_e or Mcnd_l or Mcnd_h) begin Mcnd={Mcnd_h,Mcnd_l}; temp_Mcnd=Mcnd; temp_Mplr_m=Mplr_m; temp_Mplr_e=Mplr_e; //temp_Mcnd=Mcnd; if (Mplr_m[13]) abs_Mplr_m=-temp_Mplr_m; else abs_Mplr_m=temp_Mplr_m;
//positive valute for mantissa 16 bits
if(Mcnd[31]) abs_Mcnd=-temp_Mcnd; else abs_Mcnd=temp_Mcnd; //positive valute for multiplicant 32 bits abs_Product=abs_Mplr_m*abs_Mcnd; //absolute valute for product 48 bits. 31-bit fraction, 15-bit integer and 2-bit sign. if (!Mplr_e[6]) shift_Product=(abs_Product-temp_Mplr_e); 106
if(Mplr_m[13]==Mcnd[31]) real_product=shift_Product; else real_product=-shift_Product; if ({Mplr_m,Mplr_e}==4) Product=Mcnd*4; else Product=real_product[44:13]; end endtask
107
APPENDIX C Abbreviations ALU
Arithmetic logic unit
DSP
Digital signal processors
ECI
Earth centred inertial
EX
Execution stage
FPGA
Field-programmable gate array
HDL
Hardware description language
I/O
input/output
ID
Instruction decode/register fetch stage
IF
Instruction fetch stage
LNS
Logarithmic Number System
MEM
Memory access stage
MEMS
Microelectromechanical systems
MIMO
multiple inputs and multiple outputs
MPC
Model Predictive Control
MPCDP
Model Predictive Control Dedicated Processor
MUX
multiplexer
PC
program counter
QP
Quadratic Programming
SIMD
Single instruction, multiple data
SISO
single input single output 108
VLSI
Very-large-scale integration
WB
Write-back stage
DAC
data acquisition card
MPCDP-C
C-like programming language dedicated for MPCDP
HIL
Hardware-in-the-loop
109
APPENDIX D Publications Xi, C. and W. Xiaofeng (2010)c. Model predictive control of cube satellite with magneto-torquers. Information and Automation (ICIA), 2010 IEEE International Conference on Information and Automation (ICIA 2010) Chen, X. and X. Wu (2010). "Target Processing of a Model Predictive Attitude Control System for Magnetic Actuated Small Satellites " IET Control Theory & Applications [submitted] Chen, X. and X. Wu (2011). Design and Implementation of Model Predictive Control Algorithms for Small Satellite Three-Axis Stabilization. IEEE International Conference on Informaiton and Automation (ICIA 2011) Chen, X. and X. Wu (2009). A Distributed Control Architecture for Formation Flying Space Sensor Network. 9th Australian Space Science Conference (ASSC)
110
Bibliography EDA Simulator Link™ 3 Getting Started Guide. Altera. Altera "FPGA." Altera (2005). from http://www.altera.com/products/ip/processors/nios/. Barr, M. (1999). Programming embedded systems in C and C++, O'Reilly. Bemporad, A., M. Morari, et al. (2002). "The Explicit Linear Quadratic Regulator for Constrained Systems." Automatica 38(1): 3-20. Bemporad, A., M. Morari, et al. Model Predictive Control Toolbox™ 3 Getting Started Guide. Bleris, L. G. and M. V. Kothare (2005). Real-time implementation of model predictive control. American Control Conference, 2005. Proceedings of the 2005. Brogan, W. (1985). Modern Control Theory, Prentice-Hall Inc. Cady, F. M. (1997). Microcontrollers and microcomputers, principles of software and hardware engineering, Oxford university press. Costa, A., A. D. Gloria, et al. (1997). "Fuzzy logic microcontroller." IEEE Micro 17(1): 66-74
Cumplido-Parra, R. A. L. U. (2001). On the design and implementation of a control system processor, Ph.D. thesis, Loughborough University. Eyre, J. and J. Bier (1999). "The evolution of DSP processors." IEEE signal processing magazine. Fortescue, P., k. Stark, et al. Spacecraft Systems Engineering, Wiley Press. Goodall, R. (2001). "Perspectives on processing for real-time control." Annual Reviews in Control 25: 123-131. Goodall, R. (2002). A Roadmap of Control Design Methods. Proc. UKACC 2002. Sheffeld, UK. Hannay, S. (2009). "Modelling of the Attitude Control of the Nanosatellite OUFTI-1." Liege: University of Liege. Lapsley, P., J. Bier, et al. (1997). DSP processor fundamentals, IEEE press. Lilja, D. and S. Sapatnekar (2005). Designing Digital Computer Systems with Verilog, Cambridge University Press. Minghua, H. and L. Keck-Voon (2005). Model predictive control on a chip. Control and Automation, 2005. ICCA '05. International Conference on. Nise, N. S. (1995). Control Systems Engineering, The Benjamin/Cummings Publishing Company, Inc.
111
Patterson, D. A. and J. L. Hennessy (2008). Computer Organization and Design, Fourth Edition: The Hardware/Software Interface, MORGAN-KAUFMANN. Predko, M. (1999). Title Handbook of microcontrollers, McGraw-Hill. Psiaki, M. L. (2000 ). Magnetic torquer attitude control via asymptotic periodic linear quadratic regulation. Proceedings of AIAA Guidance, Navigation and Control Conference, Denver, Colorado. SAM, M. Z. Y. (2010). Hardware Architecture for Control System Processing. School of Aerospace, Mechanical and Mechatronic Engineering. Sydney, The University of Sydney. Bachelor. Schlett, M. (1998). "Trends in embedded-microprocessor design." IEEE Computer. Sidi, M. J. (1997 ). Spacecraft Dynamics and Control CAMBRIDGE UNIVERSITY PRESS Silani, E. and M. Lovera (2005). "Magnetic spacecraft attitude control: a survey and some new results." Control Engineering Practice 13(3): 357-371. Vouzis, P. D., L. G. Bleris, et al. (2006). A Custom-made Algorithm-Specific Processor for Model Predictive Control. Industrial Electronics, 2006 IEEE International Symposium on. Wang, L. (2001). Discrete time model predictive control design using Laguerre functions. American Control Conference, 2001. Proceedings of the 2001. Wang, L. (2004). "Discrete model predictive controller design using Laguerre functions." Journal of Process Control 14(2): 131-142. Wang, L. (2009). Model Predictive Control System Design and Implementation Using MATLAB Springer Wanhammer, L. (1999). DSP integrated circuits, Academic Press. Wood, M., W.-H. Chen, et al. (2006). Model predictive control of low earth orbiting spacecraft with magneto-torquers. Computer Aided Control System Design, 2006 IEEE International Conference on Control Applications, 2006 IEEE International Symposium on Intelligent Control, 2006 IEEE. Wood, M. and C. Wen-Hua (2008). Regulation of Magnetically Actuated Satellites using Model Predictive Control with Disturbance Modelling. Networking, Sensing and Control, 2008. ICNSC 2008. IEEE International Conference on. Xilinx. "FPGA." from http://www.xilinx.com/. Xilinx. "ISE Design Suite." from http://www.xilinx.com/products/design-tools/ise-design-suite/. Xilinx (2005). Microblaze processor reference guide embedded development kit edk 7.1i. Xilinx (2008). Virtex-4 FPGA Packaging and Pinout Specification.
112