A Network Tearing Technique for FPGA-Based Real-Time Simulation ...

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2014.2365752, IEEE Transactions on Industrial Electronics IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS

1

A Network Tearing Technique for FPGA-Based Real-Time Simulation of Power Converters Tarek Ould-Bachir, Member, IEEE, Handy Fortin Blanchette, Member, IEEE, and Kamal Al-Haddad, Fellow Member, IEEE

Abstract—The realm of Hardware-in-the-Loop (HIL) simulation resorts to Field Programmable Gate Arrays (FPGAs) to achieve time-steps below 1 µs. Such low time-steps are of importance for the aerospace and automotive industries, where power converters have their switching frequencies in the 10-200 kHz range. This article proposes a Network Tearing Technique (NTT) that allows subsets of switches to be treated independently, alleviates embedded memory requirements, and reduces the computational burden. An iterative algorithm is used to determine the state of naturally commutated switches, thus offering a realistic model of the power converter, independently of its operation mode or topology. A Gauss-Jordan processing unit is implemented to solve interface voltages/currents from the torn circuit. Custom floating-point operators are used to ensure good accuracy, high frequency operation as well as low computational latency. A neutral-point-clamped (NPC) converter case study is presented to demonstrate the effectiveness of the method. Simulation results are validated against a reference model at a 750 ns time-step and 30 kHz Sine Pulse Width Modulation (SPWM) switching frequency. Index Terms—Real-time simulation, network tearing technique, FPGA, floating-point arithmetic.

I. I NTRODUCTION

H

ARDWARE-IN-THE-LOOP simulation is a prototyping approach that aims to reduce the design cost of power systems by running comprehensive tests early in the development stage [1]. However, modelling a power converter is a challenging task because of the varying topology of the circuit. Moreover, modern technologies seek to increase the power density of the converter and to reduce the total harmonic distortion (THD) by targeting switching frequencies in the 10 − 200 kHz range [2]–[4]. Such frequencies require timesteps below 1 µs, which are hardly achievable in the context of CPU-based simulation, where time-steps are in the 5 − 10 µs range at best [5]. FPGA-based simulation solves this issue by allowing very low time-steps to be reached [6]–[8]. The simulation of switching networks resorts to one of the following switch models [9]: 1) the ideal model; 2) the switching function model; or 3) the average model. When detailed transients at switching events are of interest, the ideal switch model is the preferred approach. However, factorization Manuscript received March 12, 2014; revised July 14, 2014 and August 26, 2014; accepted October 4, 2014. Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. Tarek Ould-Bachir, Handy Fortin Blanchette, and Kamal Al-Haddad are ´ with the Departement of Electrical Engineering, Ecole de Technologie Supérieure, Montreal, Canada (e-mail: [email protected], [email protected], [email protected]).

is hardly conceivable in an FPGA-based solution, so the ideal switch model approach usually implies storing all network equations for every switch status combination [10]. Such an approach limits the number of switches to 6-8. In the early 1990s, a switch model that keeps the network equations fixed regardless of switch statuses was proposed [11], [12]. The fixed-matrix modelling approach has successfully been implemented on FPGA [6], [8], [13], but it is known to introduce false numerical transients due to its convergence process that restricts its operation at high switching frequencies [14]. This paper proposes a Network Tearing Technique that splits a given circuit into separate parts, thus allowing subsets of switches to be treated independently. Such an approach increases the potential number of switches present in the circuit, alleviates on-chip memory requirements, and reduces the computational burden. Neither of these results is obtained by compromising the accuracy of the simulation. The proposed method is comparable in its principles to [15]–[19]; the originality of the presented research resides in the fact that it is tailored for FPGA-based simulations. Contributions of the paper also comprise an original FPGA-based computing engine composed of multiple processing units (PUs), among which a new and highly pipelined Gauss-Jordan processor that is responsible for solving interface variables of the torn circuit. All PUs use custom floating-point operators to ensure computational accuracy and a large dynamic range. The remainder of this paper is organized as follows. The NTT-based power converter modelling is presented in Section II. Forced and naturally commutated switches’ logic update is discussed in Section III. The content of Sections II and III are used in Section IV to model a Neutral Point Clamped (NPC) converter. Section V presents the hardware architecture for solving the NPC equations in real-time, reports its area utilization and speed performance. Section V also provides and discusses FPGA-based simulation results. Section VI concludes this work. II. M ODELLING A PPROACH A. Illustrative Example The single-phase inverter connected to an inductive load shown in Fig. 1.a is used as an example to introduce the proposed network tearing method, which proceeds as follows. The companion-circuit-based nodal analysis [20] is used for obtaining network equations. The circuit is then split into multiple sub-circuits to reduce the computational burden and memory requirements. Each sub-circuit is therefore reduced

0278-0046 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.


Rdc

+

Cdc

Vdc

−

+

Ll

Cdc

Vdc

−

Rl

2

Rdc (a) 2

1 + −

+

gdc Vdc gCdc

jCdc1

gCdc

jCdc2

Vdc

−

1

gl

whereas sub-circuit (2) is reduced to: 2 2 2 2 ieq vi gN k2I io − = . 2 k2V r2T i2i veq vo2

gLl

jLl

gdc (b) 1 o

i

io2 2

1

v1i v2i 1

gN

k1I12

k1V

rTh

1

i1i

i2i

gN

2

kI2

k2V

rTh

2

vo1 vo2

(c) 1

1 + −

gdc Vdc gCdc

jCdc1

+ −

1 i

from the previous step. The current and voltage sources replicate the contribution of the missing parts from the circuit. The reduction of sub-circuits (1) and (2) to their hybrid Norton/Thévenin equivalents, as shown in Fig. 1.c, means that sub-circuit (1) is reduced to: 1 1 1 1 ieq io gN k1I vi = − , (1) 1 vo1 veq k1V r1T i1i

2 ] = [0, 0], since sub-circuit (2) is purely Obviously, [i2eq , veq resistive. The hybrid nature of the equivalents is dictated by the fact that it is convenient to connect a given port to a voltage or a current source, knowing that the voltage or the current at that port is associated with the circuit’s state variables. This allows relaxing the switch updating algorithm, to reduce the computational burden and the simulation time-step as well. These considerations are discussed further in this paper. Each sub-circuit has at its ports two input variables (subscript i is used) and two output variables (subscript o is used), which yields a total of 8 unknowns. The network equations are built using the Tableau Approach [21]:

v

gl

gLl

Mw = z,

i1i + −

Vdc

gCdc

jCdc2

gdc

(2)

jLl

where: 

(d)

Fig. 1. Single phase inverter example: (a) Original circuit; (b) Original circuit with components replaced by their companion circuits, partitioning is highlighted; (c) Two two-port hybrid equivalents connected to solve interface voltage/currents; (d) sub-circuit (1) connected to voltage and current sources.

to an equivalent Norton/Thévenin circuit and interface currents/voltages connecting the equivalents are evaluated. Each sub-circuit is then connected to voltage and current sources with values from the previous step to solve for nodal voltages, branch currents as well as the various state variables. This final step can be performed in parallel. The procedure is illustrated in Fig. 1. Fig. 1.a presents the original circuit. Fig. 1.b presents the same circuit where components of the circuit are replaced by their companion circuit: capacitors and inductors are replaced by a conductance with a parallel current source known as the history term, responsible for modelling the energy stored in the component. Switches are replaced by equivalent conductances (1/Ron or 1/Roff , depending on switch statuses). Fig. 1.b also shows the proposed partitioning of the converter into two separate sub-circuits. Fig. 1.c shows the two two-port Norton/Thévenin equivalents used to solve for interface voltage (vi1 ) and current (i1i ). Finally, Fig. 1.d illustrates the last step that consists in connecting each sub-circuit (only sub-circuit (1) is shown) with voltage and current sources, whose values were obtained

(3)

and: (

     M=     

w z

= =

+1

0

0

0

1 gN

k1I

0

+1

0

0

k1V

0

0

r1Th

0

0

0

k2I 2 rTh

0

0

+1

0

0

0

0

0

+1

0

0

2 gN 2 kV

0

0

0

0

−1

0

+1

0 +1

0

0

0

0

0

+1

0

−1

0

−1

0

0

0

0

0

0

−1

0

+1

0

0

0

0

vi1 ,

i1i ,

vi2 ,

i1o , i1eq ,

vo1 ,

i2o ,

1 veq ,

i2eq ,

vo2 ,

2 veq ,

0,

0,

0,

i2i 0

           

T T

(4)

(5)

The unknowns in Eq. 3 are then reduce to a set of independent variables, in this specific case wind = [vi1 , i1i ]T . After permuting columns 5, 6 and 7, 8, a partial Gauss-Jordan reduction to the last two lines yields: 1 2 1 1 2 gN +gN k1I −k2I vi ieq +ieq = . (6) 1 2 1 2 1 1 2 kV −kV

rTh +rTh

ii

veq −veq

One of the main contributions of this paper is the fact that Eq. 6 is a reduction of Eq. 4 in which all the entries are additions of terms from Eq. 1 and Eq. 2. This property allows for on-the-fly matrix assembly, which is an important asset when dealing with such problems on FPGA, more so when the computational time budget is as limited as in real-time simulation problems.



3

B. FPGA-based Approach for Solving the MANA Equations

D. Combining multiple MPHE Sub-Circuits

Detailed network equations are obtained using the ModifiedAugmented Nodal Analysis (MANA) [20]. As do other companion-based circuit analysis methods, MANA proceeds by assembling network equations after discretizing all circuit devices using a numerical integration rule — the trapezoidal and the Backward Euler (BE) rules are typically used. The BE rule is preferable for the simulation of power electronic circuits because it avoids the numerical oscillations caused by the trapezoidal rule under switching conditions [12], [22]. With all circuit components replaced by their companion circuit, the nodal system of equations are assembled and given by [20]:

Interface currents/voltages at the ports of the MPHEs are computed by connecting the sub-circuits in accordance with the topology of the original circuit. Network equations are composed using the Tableau Approach [21]: I Moi wo weq , (11) = wi 0 Mio Mii

Ax = b,

(7)

where A is the augmented nodal matrix, x is a vector of unknown nodal voltages and branch currents, and b holds known voltage and current sources (including history terms). To solve MANA equations on FPGA, Eq. 7 is reformulated in a state-space alike form [8]: let us denote jn the vector containing the history terms in bn , and un the vector containing the remaining independent current/voltage sources; let yn be the output vector containing branch currents and the voltages of interest; and let Aσ the matrix for a given combination σ of switch statuses. With bn = Kc [un jn ]T , where Kc is a connectivity matrix, one finds: un −1 , (8) xn = (Aσ ) Kc jn and may obtain the following: σ Wju jn+1 = σ yn Wyu

σ Wjj σ Wyj

un jn

,

(9)

σ σ σ σ , and Wyj are inline formulations where Wju , Wjj , Wyu derived from Eq. 8 [8].

C. The Multi-Port Hybrid Equivalent

io vo

=

ieq veq

−

GN KV

KI RTh

Gind wind = gind ,

(12)

where wind is a sub-vector with independent variables from wi (redundant terms in wi are omitted). It can be shown that Gind and gind are respectively composed of terms from Moi and weq that are summed/subtracted. For certain converter topologies, Eq. 12 is sufficiently small to be solved in realtime on FPGA. Moreover, since its terms are simply sums of pre-computed values, the matrix Gind is easily assembled online. Hence, the simple composition of Gind and its relative small size makes the proposed NTT well tailored to the FPGA context. A short proof of this claim is given in the Appendix. III. D EALING WITH S WITCHES IN THE M ODEL

The MPHE is a general formulation of the Norton/Thévenin equivalents. The hybrid nature of the MPHE results from the fact that the equivalent circuit encompasses voltage ports (Norton part) as well as current ports (Thévenin part). The circuit is supposed to contain resistances, current and voltage sources only — which is a reasonable assumption when discrete companion analysis (MANA) is used. For an N port circuit with NV voltage ports and NI current ports (NV + NI = N ), the MPHE of the circuit is given by:

where wo is the vector of output voltages/currents sensed at the ports of the N MPHEs, wi is the vector of input currents/voltages imposed at these ports, and weq is the vector of the equivalent voltages/currents sources associated with the MPHEs. Moi is a block matrix composed of the RkTh , KkV , KkI , and GkN associated with the MPHEs (1 ≤ k ≤ N ). Mio and Mii are connectivity matrices. I is the identity matrix. The generalized post-compensation method [16] states that solving for wi gives, for each sub-circuit, a solution that provides the contribution of the remainder of the network to that sub-circuit. Hence, Eq. 11 offers a simultaneous and exact solution to the whole circuit at the interface ports of the various MPHE sub-circuits. The imposed currents and voltages from the solution wi are then applied to the detailed model of each sub-circuit. Eq. 11 (after appropriate column permutations) can then be reduced using partial Gaussian elimination to form:

vi ii

,

(10)

where io and vo are vectors of respectively output voltages and currents sensed at the ports; vi and ii are vectors of imposed voltages and currents at these ports; ieq and veq are vectors of equivalent current and voltage sources of the MPHE; GN , RTh are matrices of equivalent conductances and resistances respectively, KI and KV are dimensionless matrices.

All switches in the network are modelled using the resistive switch model (Ron /Roff approach). This model allows the circuit to have a proper tree in any combination of its switch statuses [23], and is known to have a better convergence for naturally commutating switches [24]. It is straightforward to use when dealing with forced commutated switches (IGBTs, for instance), but necessitates an iterative algorithm when confronted with naturally commutating switches, such as diodes.

A. Iterative Solution to Natural Commutation in Switches Diodes are widely used in power converters to provide rectification and free-wheeling capabilities. In this paper, natural commutation is simulated using the state machine technique illustrated in Fig. 2.a [24], [25]. Hence, updating a diode’s status necessitates the sign of the current and voltage at the switch terminals. By using a resistive model of the switch (Ron /Roff approach), computing the sign of the voltage is



4

vs > 0

Set initial history term jn

ON

OFF

Read input sources un and gate signals cn

is ≤ 0

(a) vs > 0

Set controlled switch statuses according to cn

ON

OFF

Compute the parameters of the N MPHEs for the current switch comb.

vs ≤ 0

(b)

ind

vs > 0

Increment simulation time point

or gate is ON

ON

OFF gate is OFF and vs ≤ 0

Solve for wn

For each sub-circuit k (1 ≤ k ≤ N) k

Solve for vsw (Eq. 13)

(c)

Fig. 2.

Not used in the relaxed version of the algorithm

(Eq. 12)

n

Diode status updating state machines.

Update switch statuses for sub circuit k

sufficient, since we have isw = Rsw vsw , with Rsw > 0. This leads to the state machine of Fig. 2.b, and adds a new set of equations to Eq. 9:

Switch iterations ended?

NO

YES

vsw n =

σ Wswu un

+

σ Wswj

jn ,

(13)

where vsw n is a vector of switch voltages, in which each row is σ attached to a naturally switching device. The matrices Wswu σ and Wswj are expressed accordingly from inline node voltage equations obtained from Eq. 8. When the diode is connected in parallel to a forced commutation switch such as an IGBT, both devices account for a single switch device and the state machine of Fig. 2.c is used instead, where the gate signal is accounted for. The switch updating state machine is implemented using an iterative approach, which is done as follows. First, the simulation algorithm reads gate signals (cn ) and sets forced commutated switches in accordance. The algorithm then iterates a given number of times over Eq. 13 to update the status of naturally commutated switches. The number of iterations depends upon the topology of the converter and its operation mode. In offline simulations, the algorithm can iterate until convergence is reached. In the context of a realtime simulation, the computation time should not exceed the time-step of the simulation, hence the number of iterations is forced to a predefined maximal value. B. NTT Simulation Algorithm Fig. 3 presents the flowchart for the simulation algorithm using the NTT and the iterative switch updating method. By splitting the circuit into separate parts, this formulation of the algorithm treats subsets of switches independently, improves the computational parallelism and allows complex converters to be simulated in real-time. By computing subcircuit equations after solving for interface voltages/currents (wnind ), the simulation results are as accurate as if network equations were computed for the whole circuit. Hence, the

Compute the parameters of the N MPHEs for the current switch comb. Solve for wnind (Eq. 12) For each sub-circuit k (1 ≤ k ≤ N) k

k

Solve for jn+1 and yn (Eq. 9)

Fig. 3. Flowchart describing the simulation algorithm using the NTT and the iterative switch status updating method.

proposed NTT ensures correct behaviour of the whole power converter model, regardless of its operating mode or topology. The algorithm of Fig. 3 requires Eq. 12 to be solved at each iteration of the switch updating loop. This constraint can be time-consuming, even if Gind is small. In certain situations, this requirement can be relaxed. This is the case when interface voltages/currents are tied to state variables of the circuit. In such circumstances, the algorithm of Fig. 3 is modified by removing the recurring solution to Eq. 12. Instead, wnind is computed only once, when correct switch statuses are obtained. That value is then used at the following simulation time point during the switch updating loop. IV. TARGET CIRCUIT A. Circuit Description This section presents the target circuit shown in Fig. 4.a: it consists of an NPC converter feeding a three-phrase RLE load. The NPC is a good example of a converter for which the NTT



5

van Rdc + -

Vdc

4

3

2 vc1

Cdc

ia

ib

ic

R

L

Ea

R

L

Eb

1 + -

Vdc

Cdc

R

vc2

L

5

Ec

Rdc

i1o1 1

vi11 1 N11

0

0

1 N22

g

g

i1o2 vi21

(a)

i2o1 vi12

2

2

2

2

i2i3

gN11

gN12

kI13

2

2

i2o2

gN21

gN22

kI23

vi22

kV31

2

kV32

2

rTh33

2

3

3

gN11

3

gN12

3

kI13

gN21

3

gN22

3

kI23

vi23

kV31

3

kV32

3

rTh33

r

5 Th21

rTh22

r r

5

vo25

3 vo3

3

3

4

4

io1

ii3 4

i4o2 v

5 Th12

5 Th11

i3i3

i3o2

vi14

5

vo15 ii25

2

i3o1 vi13

2 vo3

i5i1

4 i2

4

4

4

4 vo3

Legend

gN11

gN12

kI13

4 N21

4 N22

4 I23

Current port

4 Th33

Voltage port

g k

4 V31

g k

k

4 V32

r

(b) Fig. 4.

NPC circuit example (a) original circuit with partitioning into 5 sub-circuits; (b) The 5 MPHEs interconnected with respect to the original topology.

offers good performances on FPGA: the circuit contains 18 switches, thus the memory requirements to store all possible combinations of switch statuses is 218 times Eq. 9 and Eq. 13, which is about 2 Gb of data in single precision arithmetic, whereas an FPGA contains at best tens Mb of embedded memory. By tearing up each arm of the converter, the number of combinations reduces to 26 per arm. This yields memory requirement to bearable levels, i.e. less than 10 % of the available memory resources found on a Virtex 6. The converter of Fig. 4.a is torn into 5 distinct sub-circuits identified by a circled number and a greyed out background. Each sub-circuit is converter to its MPHE then connected to each other, as shown in Fig. 4.b. The nature of the ports is also shown: dark grey circles stand for current ports and white circles for voltage ports. Out of the 26 port variables, the torn network is governed by the four independent variables: vi11 , vi12 , i2i3 , and i3i3 . Hence, Eq. 12 yields Eq. 14, where: T wind = vi11 , vi12 , i2i3 , i3i3 .    

(15)

TABLE I NPC CIRCUIT PARAMETERS Parameters SPS Solver Simulation Sample Time SPWM Carrier Frequency SPWM Modulation Frequency AC Voltage Frequency DC Voltage (Vdc ) Rdc , R Cdc L

The terms in Gind and gind in Eq. 14 are clearly obtained from additions/subtractions of the MPHE parameters, as claimed. This will always be the case when the mutual independence of input and output currents (and reps. input and output voltages) at the current and voltage ports is guaranteed. It is worth mentioning that the interface currents/voltages identified in Eq. 15 are tied to state variables. This specificity will be used to test the relaxed NTT algorithm.

1 2 3 4 gN +gN +gN +gN 11 11 11 11

2 3 4 gN +gN +gN 12 12 12

k2I13 −k4I13

k3I13 −k4I13

2 gN

1 gN

k2I −k4I

k3I −k4I

3 4 +gN +gN 21 21 21 2 4 kV31 −kV31 k3V31 −k4V31

2 3 4 +gN +gN +gN 22 22 22 22 2 4 kV32 −kV32 k3V32 −k4V32

Value ode23tb 750 ns 400 Hz 30 kHz 400 Hz 100 V 1Ω 1 µF 1 mH

23 23 r2Th33 +r4Th33 +r5Th11 r4Th33 +r5Th21

23 23 r4Th33 +r5Th12 r3Th33 +r4Th33 +r5Th22

   

vi11 vi1

2 i2i3 i3i3





  =  


i1eq1 i1eq

2 5 −veq 1 5 −veq 2

   

(14)


6

AC SRC VOLTAGES VOLTAGE (V)

150 Ea

75

Eb

0

Ec

−75 −150

0

50

100

150

AC CURRENTS CURRENT (A)

40 ia

20

i

b

ic

0

V. FPGA IMPLEMENTATION

−20 −40

The TC2 and RTC2 models are Torn Circuit models using 2 iterations, with RTC referring to the relaxed version of the NTT algorithm. Table II shows a perfect match between TC2 and FC2. The relaxed NTT algorithm offers very good performances as well. Table II shows that the results remain close to those of the SPS reference. A slight degradation in comparison to TC2 is observed, yet the average relative error levels lie between 0.2% and 1.2%. In conclusion, the reported relative error levels are good, more so when taking into account the fact that real-time simulation is targeted [5].

0

50

100

150

Time (ms)

Fig. 5. AC voltage profile and induced current signals from the simulation of the NPC converter SPS model. TABLE II AVERAGE RELATIVE ERROR FOR VARIOUS MODELS VS . SPS REFERENCE Variable vc1 vc2 ia ib ic

FC1 3.579% 3.578% 15.576% 14.356% 22.386%

FC2 0.185% 0.185% 0.431% 0.398% 0.441%

FC3 0.185% 0.185% 0.430% 0.406% 0.443%

TC2 0.185% 0.185% 0.431% 0.398% 0.441%

RTC2 0.185% 0.185% 0.220% 1.058% 1.163%

B. Offline Simulation Results A three-phase RLE load has been selected for testing purposes. The voltage sources (Ea , Eb , Ec ) are useful to create natural rectification conditions. The frequency and amplitude of the AC sources are controlled independently from the circuit conditions. By varying the AC voltage profile as shown in Fig. 5, three operation modes of the NPC converter are induced. During the first phase (0 ≤ t ≤ 50 ms), AC voltages are kept at 0V: the NPC converter acts as an inverter. During the second phase (50 ms ≤ t ≤ 100 ms), AC voltages are increased from 0V to 150V peak. At the beginning of this sequence, the NPC is operating in inverter mode; as AC voltage amplitudes increase, the NPC moves to rectifying mode. During the third phase (100 ms ≤ t ≤ 150 ms), the SPWM is stopped while AC voltages are kept at 150V peak, thus yielding natural rectification mode. Fig. 5 presents the resulting current waveforms at the AC side. To assess the performance of the NTT, an SPS reference model was designed using the parameters of Table I. Matlab scripts were written to execute the full circuit (FC) model as well as the NTT on torn circuit (TC) models. The FC model is used to evaluate the number of iterations required for the switch updating algorithm. The TC is simulated with and without the relaxed approach. Table II presents the average relative errors observed on circuit variables. FCi models are Full Circuit models (no NTT), where i refers to the number of iterations. Our numerical experiments show that any number of iterations above 2 has results close to FC2. Hence 2 iterations are sufficient to simulate the NPC converter in any of its modes with appropriate accuracy (≃ 1 % of rel. error).

A. Hardware architecture Unlike other technologies (CPU, GPU), FPGA programming requires a complete design of a so-called Application Specific Processor (ASP) to handle all the computations. This section describes the ASP developed to simulate the NPC converter on FPGA using the relaxed version of the NTT. The architecture of the ASP implemented on the FPGA is presented in Fig. 6.a. It uses five processing units (PUs). The ASP proceeds as shown in Fig. 6.b: at the beginning of a time point of the simulation, it reads the inputs (un and cn ) ind ). The update switch and disposes of interface variables (wn−1 ind PU uses wn−1 and cn to determine switch statuses (swn ). ind The statuses are then used to compose Gind n . Meanwhile, gn ind ind ind is computed using wn−1 and un . Gn and gn are fed to the Gauss-Jordan PU that computes interface variables (wnind ), then wnind is used to compute the outputs (yn ), which consists in driving wnind to the output and summing the inverse of ia and ib to obtain ic . The latter does not influence the minimal time-step of the simulation as the ASP starts over as soon as wnind is known. B. Gauss-Jordan Processing units All the PUs use a parallel dot-product (DP) design approach [8], except for the Gauss-Jordan (GJ) whose implementation is new. The GJ algorithm is known to proceed in two steps. During Step I, the matrix is converted to echelon form with all diagonal entries equal to 1. During Step II, a back substitution process solves sequentially for each unknown. In order to reduce the latency associated with the algorithm, we propose to skip Step II and to run instead multiple copies of the same problem with permuted columns. However, to keep the memory requirements at bearable levels, the matrices are duplicated and have groups of their columns permuted at the first iteration and every time the rank of the reduced matrices are powers of 2 (2, 4, etc.). Fig. 7 illustrates the proposed modification to the GJ algorithm for a 4 × 4 matrix. The unknowns are noted x1 , x2 , x3 and x4 , and the system of equations has a unique solution, that is [1, 2, 3, 4]. At the beginning of the algorithm, the matrix is duplicated and the group of columns 1-2 and 3-4 are permuted, then a partial GJ reduction is performed to eliminate one variable on both systems. For instance, the entry at the second row and second column (7) is replaced by 7 − 3 × 6/2 = −2, whereas the same position after the permutation of columns 1-2 and 3-4 (9) is replaced by 9 − 5 × 8/4 = −1. The resulting two



7

z-1 ind

ind

wn-1

gn

ind

Compute gn

ind

ind

Update switch statuses

cn

ind

Gauss-Jordan

ind

Compute Gn

swn

ind

Gn wn = gn

un

wn-1

ind

wn

Compute yn

ind

yn

Gn

(a) ASP Inputs

Minimal time-step duration

un cn

COMPUTE g SW UPDATE

ind

COMPUTE G

Gauss-Jordan

y

wn-1

State Variables

PU Outputs ind

ind

ind

Gn

gn swn

wn

yn

(b) Fig. 6.

NTT-based ASP (a) Architecture of the ASP used for the simulation of the NPC circuit; (b) Single time-step computing sequence decomposition.

x1

x2

x3

x4

2

3

4

5

40

x3

x4

x1

x2

4

5

2

3

TABLE III FPGA RESOURCE UTILIZATION , TARGETING THE V IRTEX 6 XC5VLX240T-1 [26] OF THE ML605 FPGA BOARD .

40

Duplicate, permute 6

7

8

9

80

8

9

6

7

80

1

3

1

5

30

1

5

1

3

30

5

1

3

1

20

3

1

5

1

20

x3

x4

x1

x2

reduce

reduce x2

reduce x3

x4

-4

-2

x3

-2

-4

-6

-40

-1

2.5

10

-6.5

-7 -11.5

-80

Duplicate, permute

x4

x3

-2

-4

x4 6

8

x4

1.5

-20

reduce

-20

x1

x2

8

6

20

8

-50

reduce

6

-50

-2

x1

x2

-1

2

1

-10

-4

20

-2.75 3.5 -1.25

-10

Duplicate, permute

x2

x1

6

8

20

-4

-2

-10

x1

-10

reduce

-2.5

Slices 1,671 2,687 987 1,332 781 7,277 37,680

BRAM 0 20 8 13 0 41 416

DSP 24 60 0 32 0 116 768

Min. period 2.469 ns 2.466 ns 2.468 ns 2.453 ns 2.460 ns 2.488 ns N/A

Latency 38 61 46 182 22 289 N/A

pipelined and can accept new operands while processing older ones, hence multiple copies of a problem can be executed simultaneously. This property allows the reduction of the total latency associated with the execution of the GJ algorithm.

x1

x2 30

0

3.75 0.5 2.25

reduce

reduce x3

20

x4

x2

x3

x4 5

Module Compute gind Switch update Compute Gind Gauss Jordan Compute y NPC ASP Available

3.33

-5

3.33

C. Implementation results divide x4

Fig. 7.

divide

divide x3

4

x2

3

divide x1

2

1

Illustration of the proposed modification to the GJ algorithm.

init gnind Gnind

RAM 1 0

RAM RAM RAM

Fig. 8.

X +

1/y

X

x/y

wnind

!

GJ processing unit.

3 × 3 matrices are then reduced to 2 × 2 matrices using partial GJ elimination. The new matrix size is a power of two, so duplications and permutations are performed. After reduction, we end up with four singletons: simple divisions suffice to obtain the aforementioned solution. Fig. 8 presents the architecture of the PU that executes such a variation of the GJ algorithm. The arithmetic operators are

All numerical values in the ASP are represented in floatingpoint to allow a large dynamic range. The floating-point format used is a non-standard one with intermediate precision (52-bit signed mantissa and 8-bit exponent), while all the additions are carried out using an internal floating-point format called the Self-Alignment Format (SAF) that uses an 80-bit extended mantissa. This methodology has been introduced in [8] whereas the SAF is thoroughly discussed in [27]. Each multiplication consumes 6 DSP blocks and necessitates 5 clock cycles. The reciprocal and the division operators are designed using the Newton-Raphson algorithm [28]. Table III presents the implementation results for the PUs and the ASP when the Virtex 6 LX240T-1 is targeted [26]. The design is performed using the System Generator Matlab/Simulink toolbox from Xilinx. The PUs were designed with a 400 MHz target frequency (2.5 ns minimal clock period), which is quite high in regard to the industrial standards and the literature known to the authors. As shows Table III, the 2.5 ns constraint is met by all the PUs, as well as by the ASP. The ASP consumes about 19% of the reconfigurable logic (slices). We also observe that the memory consumption is around 9% to 10% of the available BRAMs, which meets



8

(a) AC CURRENT (i )

(b) AC CURRENT (i )

c

-15

15 0 -15 -30 70

30

vs. SPS vs. RTC2

100 1 1e-2

25

85

0 -15 -30 120

90

30

1 1e-2

1e-6 70

75

80

85

125

130

(f) FPGA vs. MODELS

vs. SPS vs. RTC2

100

Time (ms)

Fig. 9.

80

15

(e) FPGA vs. MODELS Relative error (%)

Relative error (%)

(d) FPGA vs. MODELS

75

Relative error (%)

25

30

CURRENT (A)

CURRENT (A)

CURRENT (A)

0

1e-6 20

c

30

15

-30 20

(c) AC CURRENT (i )

c

30

90

vs. SPS vs. RTC2

100 1 1e-2

1e-6 120

Time (ms)

125

130

Time (ms)

Zoomed-in captures from the ic current, along with the relative errors produced by the ASP with respect to the SPS and RTC2 references.

our expectations. Memory resources are located at the Switch Update PU, the Gind n computing PU and the GJ PU. Regarding the first two PUs, their BRAM consumption is explained by their switch status dependant characteristic. DSP block consumption is about 15% of the available resources, most of which are located at the switch update PU. The Gind n and yn computing PUs do not use any DSP block at all, which is explained by the fact that those PUs do not perform any multiplication. Table III presents the latency of each PU as well as the total latency of the ASP, which is 289 clock cycles. The latency of the ASP comprises those of the Switch Update, the Gind n computing and the GJ PUs. At a clock frequency of 400 MHz, 289 clock cycles yield 722.5 ns, but the time step of the ASP was rounded up to 750 ns. The Gauss-Jordan PU is the most time-consuming unit. This observation is consistent with the algorithm’s complexity which grows as O(N 3 ) with the rank (N ) of the matrix. Without our modified version of the GJ algorithm, latency would have doubled and simulation timestep would have been higher than 1 µs.

TABLE IV AVERAGE RELATIVE ERROR FOR FPGA MODEL Variable vc1 vc2 ia ib ic

vs. SPS 0.185% 0.185% 0.206% 1.061% 1.142%

vs. FC2 0.331e-3% 0.310e-3% 0.177% 0.335% 0.466%

vs. RTC2 0.277e-3% 0.277e-3% 0.047% 0.023% 0.093%

When the FPGA-based simulation is compared to the RTC2 model, we are assessing the computational accuracy of the ASP, which lies between 10−6 % and 10−2 %. Such levels of relative errors are explained by the fact that the implemented floating-point operators are not fully IEEE double precision compliant. However, this design tradeoff comes with area savings and offers very low latency computations. In conclusion, the results from Table IV and Fig. 9 clearly demonstrate the effectiveness of the NTT and that of the proposed hardware design methodology. E. Voltage-Switching Behaviour

D. FPGA-based simulation results The ASP was implemented on an ML-605 board from Avnet. The design was hooked up to a host computer via an ethernet TCP/IP link. The results were then compared using MATLAB to the SPS reference, the FC2 model and to the RTC2 model. Table IV presents the average relative errors resulting from the ASP computations. Fig. 9 presents zoomedin captures from the ic current waveform for each mode of the simulation, along with the evolution of the relative errors with respect to the SPS and RTC2 references. Theses captures confirm the results from Table IV and show that, for the SPS and FC2 references, the maximum relative error is observed when current is close to zero. This observation is explained by the properties of the BE integration rule that is known to exhibit a low accuracy due to its low order. However, the errors remain at acceptable levels (≃ 1%).

Certain switch modelling techniques, such as the fixed admittance switch model [11], [12], are known to introduce false transients during voltage switchings. Depending on converter load and switching frequency, those transients do not necessarily affect the overall shape of the current. In order to make sure that such phenomena are not observed in our model, the voltage from the converter to the neutral point of the load (van , see Fig. 4.a for correspondance) is shown in Fig. 10, along with results from the SPS reference and the absolute errors produced. The capture of Fig. 10.a is taken from the moment of the simulation where the errors on the current waveform were the most problematic, which occurred around 80 ms of simulation time. The superimposition of the model and the SPS reference shows a perfect match between them, which is confirmed by the plot of Fig. 10.d.



9

0

SPS MODEL

79

79.5

80

80.5

81

0

79.4

1e-2

79.5

80

80.5

79.42

79.46

81

10 1 1e-2

1e-6 79.4

79.42

Time (ms)

Fig. 10.

79.44

100 0 -100 80.92

(e) ABSOLUTE ERROR Abs. Error (V)

Abs. Error (V)

(d) ABSOLUTE ERROR 10 1

1e-6 79

SPS MODEL

-100

79.44

Time (ms)

79.46

SPS MODEL 80.94

80.96

80.98

(f) ABSOLUTE ERROR Abs. Error (V)

-100

100

(c) VOLTAGE ON PHASE A VOLTAGE (V)

100

(b) VOLTAGE ON PHASE A VOLTAGE (V)

VOLTAGE (V)

(a) VOLTAGE ON PHASE A

10 1 1e-2

1e-6 80.92

80.94

80.96

80.98

Time (ms)

Voltage from the converter to the neutral point of the load on phase a, superimposed on the result from the SPS reference.

Fig. 10.b shows a close-up of the voltage profile at the beginning of the sequence shown in Fig. 10.a. The perfect match between our model and the SPS reference appears clearly, more so when considering the absolute error associated with this capture, as shown in Fig. 10.e. We also observe that the error reaches its maxima during voltage switchings, but the error remains very acceptable and the model rapidly converges to the desired value, with absolute errors below 0.01V. Fig. 10.c shows another close-up of the voltage profile — which occurs at the end of the sequence shown in Fig. 10.a this time, where a peak value of the error was observed. The zoomed view confirms that the difference occurrence at a voltage switching point and only lasts for on time-step, as the model converges rapidly toward to the correct value. VI. C ONCLUSION As the HIL prototyping approach gains importance for the industry, FPGA-based real-time simulation becomes a key component that helps meeting the timing constraint imposed by high switching frequency power converters. This article presented an original solution to that issue whose formulation results from the development of a Network Tearing Technique, specially tailored to the FPGA context. A very high operating clock frequency is reported for the proposed ASP design, at an acceptable FPGA resource consumption. The article also presented a detailed design methodology that ensures high computational accuracy, high performance, and moderate reconfigurable footprint. It allows < 1 µs time-steps to be reached for the real-time simulation of power converters. A PPENDIX P ROOF FOR THE C LAIM R EGARDING THE R EDUCED N ETWORK E QUATIONS In Eq. 11, Mio and Mii are connectivity matrices composed of 0s and ±1 entries. If current ports are connected to each

other, and if so are the voltage ports (mutual exclusiveness of input and output variables criterion), then we have: " # (1) 0 Mii Mio Mii = , (16) (2) Mio 0

Moreover, since a set of independent variables wind is present in wi , and supposing that those are placed at the bottom end of wi (which can necessitate the permutations of certain columns (1) of M, as we did in the example of Section II-A), then Mii can be built in a way that we guarantees the following: i h (1) . (17) Mii = I M(12) ii where I is a square identity matrix with a number of columns (2) equal to the number of dependent variables in wi (hence, Mio ind has a number of rows equal to the size of w ). From the above, it results that a partial Gauss-Jordan reduction of M to its last rows (those associated with wind ) only affects those rows and leaves the remaining part of the matrix unchanged, thus yielding a matrix Gind composed of (2) (12) sums/subtractions of terms from Moi , as Mio and Mii contain nothing but 0s and ±1 entries.

R EFERENCES [1] J. Mahseredjian, V. Dinavahi, and J. Martinez, “Simulation tools for electromagnetic transients in power systems: Overview and challenges,” IEEE Trans. Power Del., vol. 24, no. 3, pp. 1657–1669, July 2009. [2] Y. Levron, H. Kim, and R. Erickson, “Design of EMI filters having low harmonic distortion in high-power-factor converters,” IEEE Trans. Power Electron., vol. 29, no. 7, pp. 3403–3413, July 2014. [3] B. Zhang, K. Zhou, and D. Wang, “Multirate repetitive control for PWM DC/AC converters,” IEEE Trans. Ind. Electron., vol. 61, no. 6, pp. 2883– 2890, June 2014. [4] A. Kuperman, U. Levy, J. Goren, A. Zafransky, and A. Savernin, “Battery charger for electric vehicle traction battery switch station,” IEEE Trans. Ind. Electron., vol. 60, no. 12, pp. 5391–5399, Dec 2013. [5] H. F.-Blanchette, T. Ould-Bachir, and J.-P. David, “A state-space modeling approach for the FPGA-based real-time simulation of high switching frequency power converters,” IEEE Trans. Ind. Electron., vol. 59, no. 12, pp. 4555–4567, December 2012.



[6] M. Matar and R. Iravani, “FPGA implementation of the power electronic converter model for real-time simulation of electromagnetic transients,” IEEE Trans. Power Del., vol. 25, no. 2, pp. 852–860, April 2010. [7] D. Majstorovic, I. Celanovic, N. D. Teslic, N. Celanovic, and V. A. Katic, “Ultralow-latency hardware-in-the-loop platform for rapid validation of power electronics designs,” IEEE Trans. Ind. Electron., vol. 58, no. 10, pp. 4708–4716, Oct. 2011. [8] T. Ould-Bachir, C. Dufour, J. Bélanger, J. Mahseredjian, and J.-P. David, “A fully automated reconfigurable calculation engine dedicated to the real-time simulation of high switching frequency power electronic circuits,” Math. Comput. Simulat., vol. 91, pp. 167–177, 2013. [9] H. Jin, “Behavior-mode simulation of power electronic circuits,” IEEE Trans. Power Electron., vol. 12, no. 3, pp. 443–452, May 1997. [10] Y. Inaba, S. Cense, T. Ould-Bachir, H. Yamashita, and C. Dufour, “A dual high-speed PMSM motor drive emulator with finite element analysis on FPGA chip with full fault testing capability,” in European Conf. Power Electron. App., Nottingham, UK, Sept. 2011, pp. 1–10. [11] S. Hui and C. Christopoulos, “A discrete approach to the modeling of power electronic switching networks,” IEEE Trans. Power Electron., vol. 5, no. 4, pp. 398–403, Oct. 1990. [12] P. Pejovic and D. Maksimovic, “A method for fast time-domain simulation of networks with switches,” IEEE Trans. Power Electron., vol. 9, no. 4, pp. 449–456, July 1994. [13] P. Le-Huy, S. Guerette, L. A. Dessaint, and H. Le-Huy, “Real-time simulation of power electronics in power systems using an FPGA,” in Canadian Conf. Electr. Comput. Eng., May 2006, pp. 873–877. [14] M. Dagbagi, L. Idkhajine, E. Monmasson, and I. Slama-Belkhodja, “FPGA implementation of power electronic converter real-time model,” in Int. Symp. Power Electro. Electr. Drives, Automation Motion, June 2012, pp. 658–663. [15] L. Chua and L.-K. Chen, “Diakoptic and generalized hybrid analysis,” IEEE Trans. Circuits Syst., vol. 23, no. 12, pp. 694–705, 1976. [16] J. Mahseredjian, S. Lefebvre, and D. Mukhedkar, “Power converter simulation module connected to the EMTP,” IEEE Trans. Power Syst., vol. 6, no. 2, pp. 501–510, 1991. [17] J. Marti, L. Linares, J. Calvino, H. Dommel, and J. Lin, “OVNI: an object approach to real-time power system simulators,” in Proc. Int. Conf. Power Syst. Technol., Aug. 1998, pp. 977–981. [18] K. Strunz and E. Carlson, “Nested fast and simultaneous solution for time-domain simulation of integrative power-electric and electronic systems,” IEEE Trans. Power Del., vol. 22, no. 1, pp. 277–287, 2007. [19] C. Dufour, J. Mahseredjian, and J. Bélanger, “A combined state-space nodal method for the simulation of power system transients,” IEEE Trans. Power Del., vol. 26, no. 2, pp. 928–935, April 2011. [20] J. Mahseredjian, S. Dennetière, L. Dubé, B. Khodabakhchian, and L. Gérin-Lajoie, “On a new approach for the simulation of transients in power systems,” Electric Power Systems Research, vol. 77, no. 11, pp. 1514–1520, 2007, selected Topics in Power System Transients - Part II. [21] G. Hachtel, R. Brayton, and F. Gustavson, “The sparse tableau approach to network analysis and design,” IEEE Transactions on Circuit Theory, vol. 18, no. 1, pp. 101 – 113, Jan. 1971. [22] E. Hairer and G. Wanner, Solving ordinary differential equations II: Stiff and differential-algebraic problems, ser. Solving ordinary differential equations. Springer, 2010. [23] L. Chua and P. Lin, Computer-aided analysis of electronic circuits: algorithms and computational techniques, ser. Prentice-Hall series in electrical and computer engineering. Prentice-Hall, 1975. [24] F. E. Cellier and E. Kofman, Continuous System Simulation. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006. [25] J. Hautier and J. Caron, Convertisseurs statiques: méthodologie causale de modélisation et de commande, ser. Méthodes et pratiques de l’ingénieur. Technip, 1999. [26] Xilinx, DS150 v2.3: Virtex-6 Family overview, March 2011. [27] T. Ould-Bachir and J.-P. David, “Self-alignment schemes for the implemention of addition-related floating-point operators,” ACM Trans. Reconfigurable Technol. Syst., vol. 6, no. 1, pp. 1–21, 2013. [28] M. Flynn, “On division by functional iteration,” IEEE Trans. Comp., vol. C-19, no. 8, pp. 702–706, Aug. 1970.

10

Tarek Ould-Bachir (S’05–M’08) received the B.Eng. M.A.Sc. and Ph.D. degrees in electrical ´ engineering from Ecole Polytechnique de Montréal, Montreal, Qc, Canada, in 2005, 2008 and 2013 respectively. From 2013 to 2014, he was a post´ doctoral fellow at Ecole de Technologie Supérieure, Montreal, where his research focused on high performance FPGA-based computing engines for the realtime simulation of power converters. Since 2005, he ´ has been lecturing undergraduate courses at Ecole Polytechnique de Montréal. From 2007 to 2009, he worked as an FPGA Application Specialist with Opal-RT Technologies. Since september 2014, he is a Senior Simulation Specialist with the same company. His research interests include real-time simulation, power electronics, FPGAs, and digital arithmetic.

Handy Fortin Blanchette (S’07-M’10) received the B.Eng., M.Eng. and Ph.D. degrees in elec´ trical engineering from the Ecole de Technologie Supérieure (ETS), Montreal, Qc, Canada, in 2001, 2003 and 2010, respectively. From 1998 to 2000, he was with the Bombardier Transport-ETS Research Laboratory, where he worked on high-power traction systems. From 2001 to 2003, he was involved in the development of an electrical drive library for Simulink. From 2007 to 2010, he was with OPALRT Technologies, where he led power electronics real-time simulation projects. From 2010 to 2011, he was a Visiting Scholar with the Center for Power Electronics Systems, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA, where he was involved in the packaging of converters for aircraft applications. He is currently an Associate Professor of electrical engineering with ETS. His current research interests include EMI, FPGA-based real-time simulations, and high-density power converter packaging.

Kamal Al-Haddad (S’82-M’88-SM’92-F’07) received the B.Sc.A. and M.Sc.A. degrees from the University of Québec Trois-Rivières, Canada, in 1982 and 1984, respectively, and the Ph.D. degree from the Institut National Polytechnique, Toulouse, France, in 1988. Since June 1990, he has been a Professor with the Electrical Engineering Department, ´ Ecole de Technologie Supérieure (ETS), Montreal, QC, where he has been the holder of the Canada Research Chair in Electric Energy Conversion and Power Electronics since 2002. He is a Consultant and has established very solid link with many Canadian industries. His fields of interest are in high efficient power converters, harmonics and reactive power control, switch mode and resonant converters including the modelling, control, and development of prototypes for industrial applications. Prof. Al-Haddad is a fellow member of the Canadian Academy of Engineering. He is IEEE IES President Elect, Associate editor of the Transactions on Industrial Informatics and IES Distinguished Lecturer.


A Network Tearing Technique for FPGA-Based Real-Time Simulation ...

A Network Tearing Technique for FPGA-Based Real-Time Simulation ...

Suggest Documents

FPGAbased educational platform for realtime image processing ...

TECHNIQUE FOR CALIBRATING NETWORK

A Novel Technique For Simulation & Analysis Of

Realtime parameter estimation, calibration and simulation of a DC motor

An FPGAbased integrated environment for computer architecture

TrueTime Network - a Network Simulation Library for ... - CiteSeerX

Realtime

Realtime

IP Network Planning for Realtime Services with ... - Semantic Scholar

Proof-by-Instance for Embedded Network Design - RealTime-at-Work

Simulation of Hot Tearing and Distortion during ... - University of Iowa

mixed simulation-optimization technique for ... - Semantic Scholar

Large-Eddy Simulation Technique for ... - Semantic Scholar

Time-Domain Physical-Optics Simulation Technique for ...

Simulation of Hot Tearing and Distortion during ... - University of Iowa

Simulation of Hot Tearing and Distortion During Casting Steel ...

A Compiled-Code Simulation Technique for RTL ... - Kai-hui Chang

A Simulation Technique for Wireless Mesh Networks to Present Its ...

Validation of a Low Dose Simulation Technique for ... - CiteSeerX

Simulation and Optimization Technique for a Multi-mirror ... - IEEE Xplore

A Simulation Technique for Performance Analysis of ...

A Prescriptive Technique for V&V of Simulation ... - Semantic Scholar

A simulation and restitution technique for the perceptive ... - Webistem

A simulation technique for 2-D unsteady inviscid