Real-Time Particle Swarm Optimization on FPGA for the ... - MDPI

0 downloads 0 Views 4MB Size Report
Oct 24, 2018 - Real-Time Particle Swarm Optimization on FPGA for the Optimal Message-Chain Structure. Heoncheol Lee *, Kipyo Kim, Yongsung Kwon and ...
electronics Article

Real-Time Particle Swarm Optimization on FPGA for the Optimal Message-Chain Structure Heoncheol Lee *, Kipyo Kim, Yongsung Kwon and Eonpyo Hong Agency for Defense Development; Daejeon 34186, Korea; [email protected] (K.K.); [email protected] (Y.K.); [email protected] (E.H.) * Correspondence: [email protected] or [email protected]; Tel.: +82-42-821-2247 Received: 21 September 2018; Accepted: 22 October 2018; Published: 24 October 2018

 

Abstract: This paper addresses the real-time optimization problem of the message-chain structure to maximize the throughput in data communications based on half-duplex command-response protocols. This paper proposes a new variant of the particle swarm optimization (PSO) algorithm to resolve real-time optimization, which is implemented on field programmable gate arrays (FPGA) to be performed faster in parallel and to avoid the delays caused by other tasks on a central processing unit. The proposed method was verified by finding the optimal message-chain structure much faster than the original PSO, as well as reliably with different system and algorithm parameters. Keywords: real-time optimization; PSO; FPGA; message-chain structure

1. Introduction Data communications with half-duplex command-response protocols generally consist of a commander controlling the transmission line and response terminals responding to the commander. In the MIL-STD-1553B data communication system [1], which is a well-known military communication platform using a serial data bus, it consists of a bus controller (BC) as the commander and remote terminals (RTs) as the response terminals. In the physical layer, the properties of the electrical design inevitably limit the bit rate of the data bus to 1 Mbps. Besides, in the application layer, since the communication protocol is generally determined based on messages that consist of a set of words composed of multiple bits [2], it is hard to use the full bandwidth. Therefore, the structure of message-chains for the BC and the RTs should be carefully determined to maximize the data throughput with consideration of the limited bit rate and insufficient bandwidth. Data throughput can be improved by simply increasing the number of messages within a given transmission duration in the message-chain structure of the BC. However, the excessive increase in the number of messages with a fixed transmission duration may cause the loss of data in the RTs because the data in receiving buffers may be rewritten by the currently receiving data before the previously received data processing is finished. Therefore, to achieve high data throughput without data loss, the number of messages and the transmission period in the structure of message-chains should be carefully determined. Zhang et al. [3] improved the data throughput in MIL-STD-1553B communication systems with a heuristic algorithm rearranging the components of the command table based on the number of command or data words. However, they did not consider the processing time in the RT, which affects the reliability of the data communications. Kim et al. [4] presented and analyzed the whole timing diagram between the BC and the RT. They found that when multi-message chains and double buffers were used for the communication system, the data throughput can be improved. However, the message-chain structure in their approach was empirically determined. One can consider optimization algorithms such as the least square methods. Recently, sampling-based optimization algorithms, which are particle swarm Electronics 2018, 7, 274; doi:10.3390/electronics7110274

www.mdpi.com/journal/electronics

Electronics 2018, 7, 274

2 of 15

optimization (PSO) [5] and its variants [6–8], have been widely used because of their convenience and good performance in spite of nonlinear system constraints. However, optimization algorithms may be faced with much computation time because of a large number of samples, which means that the computation time should be reduced significantly to be applied to real-time systems. In this work, the optimization process needs to be conducted in real-time to find the optimal message-chain structure as fast as possible with different system parameters. Liu et al. [9] proposed real-time approaches for the PSO applied to identify and cancel the current harmonics in power systems. However, since their approach was based on the limited number of particles, it cannot be applied to more complex optimization problems, requiring more number of particles. To reduce computation time, the PSO was recently implemented on field-programmable gate arrays (FPGA) in various fields. Gao et al. [10] proposed an FPGA implementation method of the PSO to quickly select and update the coefficients of adaptive infinite impulse response (IIR) filters. In addition, Gupta and Mehra [11] implemented the modified PSO algorithm on an FPGA for fast unknown system identification, selecting and updating the coefficients of adaptive infinite impulse response (IIR) filters. Vasumathi and Moorthi [12] developed the hybrid adaptive neural network and the PSO and implemented it on a Spartan 3E FPGA to accelerate the computation speed for real-time processing. Morsi et al. [13] implemented the PSO algorithm on an FPGA to quickly compute the structural similarity (SSIM) index between the target image and the candidate region. They showed that the results of their implementation, which is a combination of software and hardware, were better than the results of the software-only implementation. Trimeche et al. [14] implemented the PSO on FPGA for a multi-input multi-output (MIMO) detection system. Their work was able to reduce the computational complexity of the maximum likelihood estimation in an MIMO detection system. This paper proposes a new variant of the PSO for real-time optimization, which is conducted on FPGA to find the optimal number of messages and the transmission period in the message-chain structure. The original PSO is modified to be properly implemented on FPGA by synthesizable hardware description language (HDL) codes. Consequently, the contributions of this paper are as follows. First, the proposed approach can find the optimal number of messages and the transmission period without data loss in the structure of message-chains for data communication systems with half-duplex command-response protocols. Second, the proposed approach can be conducted in real-time, despite the high computation load of the PSO. To the best of our knowledge, PSO to optimize the message-chain structure in data communication systems with half-duplex command-response protocols has never been implemented on FPGA. The main notations used in this paper are summarized in Table 1. Table 1. The main notations in this paper. Notation NBC TBC TRT TW TH t MG tP tI pm n,i vm n,i p gb pm n,pb

Description The number of messages in a message chain in the BC The transmission period of a message chain in the BC The whole required time in the RT Total writing time for responding messages in the RT Total processing time for the high-priority tasks in the RT Message gap time in the BC Processing time for the received message in the RT Processing time for an interrupt service routine in the RT The position of the n-th particle of the m-th group at the i iteration in FRPSO The velocity of the n-th particle of the m-th group at the i iteration in FRPSO The global best position throughout all particles in FRPSO The personal best position of the n-th particle of the m-th group in FRPSO

This remainder of this paper is organized as follows: Section 2 defines and formulates the optimization problem of message-chain structures in half-duplex command-response protocols.

Electronics 2018, 7, 274 Electronics 2018, 7, x FOR PEER REVIEW

3 of 15 3 of 15

This remainder of this paper is organized as follows: Section 2 defines and formulates the Section 3 describes the proposed approach, which is a real-time PSO algorithm on FPGA. Section 4 optimization problem of message-chain structures in half-duplex command-response protocols. gives the evaluation of the performance of the proposed approach with various algorithm parameters Section 3 describes the proposed approach, which is a real-time PSO algorithm on FPGA. Section 4 and system conditions. The real-time capability of the proposed approach implemented on FPGA is gives the evaluation of the performance of the proposed approach with various algorithm parameters then verified by successfully finding the optimal message-chain structure. Finally, Section 5 offers and system conditions. The real-time capability of the proposed approach implemented on FPGA is the conclusions. then verified by successfully finding the optimal message-chain structure. Finally, Section 5 offers the conclusions. 2. Optimization Problem of Message-Chain Structures The formulation of the problem depends on a specific data communication system. 2. Optimization Problem ofoptimization Message-Chain Structures In our previous work [15], for the offline optimization of the MC structure, we analyzed its timing The between formulation of the problem depends onevery a specific data communication diagram the BC andoptimization the RT, as shown in Figure 1. At transmission period (TBC ),system. the BC In our previous work [15], for the offline optimization of the MC structure, we analyzed timing transmits an MC, which consists of a certain number of messages, NBC . In each MC, the timeits assigned diagram betweenisthe BC and the RT, asgap shown Figure 1. At every transmission period (TBC(t), the), to each message called the message time in (tMG ), including the inter-message gap time IMG BC transmits an MC, which consists of a certain number of messages, N BC. In each MC, the time which separates messages. assigned to eachrequired messagetime is called theRT message time (twith MG), including the inter-message gap time The whole in the (TRT ) isgap analyzed the following factors: the processing (t IMG), which separates messages. time for an interrupt service routine (tI ) and the received data (tP ), the total writing time for responding The whole the RT (Ttime RT) is analyzed with the following factors: the processing to messages (TWrequired ), and thetime total in processing for high-priority tasks (TH ). The most important point time for an interrupt service routine (t I) and the received data (tP), the total writing time for in the RT is to design a structure for MC, which can finish the whole processing for the current MC responding to messages W), and the total processing time for high-priority tasks (TH). The most before receiving the next (T MC at the same buffer. Since TW is considered to be of two types—tW1 and important point in the RT is to design a structure can finish whole processing for tW2 , which are the average writing times before for andMC, afterwhich completing thethe data reception from the the current MC before receiving the next MC at the same buffer. Since TW is considered to be of two BC—it can be computed as follows: types—tW1 and tW2, which are the average writing times before and after completing the data reception from the BC—it can be computed TW =asNfollows: (1) W1 tW1 NRD + ( NBC − NW1 ) tW2 NRD

(1) 𝑇 = 𝑁 𝑡 𝑁 + (𝑁 − 𝑁 )𝑡 𝑁 where NRD is the number of response messages that should be written at the buffers in the RT for each where NRD is the messages at theisbuffers in the of RTmessages for each message from thenumber BC, andofNresponse ( NBC − 1that )/ (should t P + tW1beNwritten the number RD )e , which W1 = d t MG (𝑁 ⌈𝑡 message from the BC,RT and 𝑁 = 1)/(𝑡 𝑡 𝑁 )⌉, which theThen, number messages on on processing in the before completing − the data + reception from theisBC. theoftotal required processing before completing the data reception from the BC. Then, the total required time time in the in RT,the TRTRT , can be obtained as follows: in the RT, TRT, can be obtained as follows: TRT = NBC (2t I + t P ) + 2TH + TW (2) (2) 𝑇 = 𝑁 (2𝑡 + 𝑡 ) + 2𝑇 + 𝑇

Figure 1. The The timing timing diagram diagram of of data data processing processing in in the the RT RT with with double double buffers buffers for the received message-chains from the BC.

Electronics 2018, 7, 274

4 of 15

To avoid data loss caused by rewriting the buffer containing the current MC by the next MC transmitted for the same buffer, the relation between TRT and TBC is established as follows: 2TBC > TRT

(3)

Electronics 2018, 7, x FOR PEER REVIEW

4 of 15

Additionally, the transmission margin for unexpected situations in the BC, which is the time To avoid data caused by the bufferofcontaining the current MC byas the MC margin from the end of loss the current MCrewriting to the beginning the next MC, is considered annext optimization transmitted for the same buffer, the relation between T RT and TBC is established as follows: constraint because it has been heuristically determined. The additional constraint for the transmission (3) margin, α, in the BC can be formulated as follows: 2𝑇 𝑇 Additionally, the transmission margin for unexpected situations in the BC, which is the time NBC tto ≤ (beginning 1 − α) TBCof the next MC, is considered as an (4) MGthe margin from the end of the current MC optimization constraint because it has been heuristically determined. The additional constraint for where ≤ α < 1. The largerα,αin means transmission the0 transmission margin, the BCmore can be formulated asmargin. follows: Finally, the upper limits of TBC and

NBC are determined according to system requirements and applied to the PSO algorithm. 𝑁 𝑡

(1 − 𝛼) 𝑇

(4)

3. Real-Time FPGA where 0 PSO 𝛼 < on 1. The larger α means more transmission margin. Finally, the upper limits of TBC and NBC are determined according to system requirements and applied to the PSO algorithm.

To resolve the real-time optimization problem, we propose a new FPGA-based real-time PSO (FRPSO), which is a hardware-friendly variant of the original PSO using the hardware resources 3. Real-Time PSO on FPGA of FPGAs for parallel processing. The main differences between the original PSO and FRPSO are To resolve the real-time optimization problem, we propose a new FPGA-based real-time PSO summarized in Table 2. The main advantage of FRPSO is the parallelizability with look-up tables (FRPSO), which is a hardware-friendly variant of the original PSO using the hardware resources of (LUTs), which is more flexible and scalable than CPU cores. FPGAs for parallel processing. The main differences between the original PSO and FRPSO are summarized in Table 2. The main advantage of FRPSO is the parallelizability with look-up tables 3.1. Particle and Swarm Definition (LUTs), which is more flexible and scalable than CPU cores.

In FRPSO, the particle and swarm are newly defined for efficient FPGA implementation for 3.1. Particle and Swarm Definition multiplication, and division. In addition, the particle swarm with fixed-point addition, subtraction,

NP particles is rearranged with NGswarm groups. n-th particle the m-th group at the i-th iteration In FRPSO, the particle and are The newly defined for of efficient FPGA implementation for m m m m m fixed-point addition, division. the xparticle swarm with is defined as pn,i = [ xn,isubtraction, , yn,i ] wheremultiplication, n = 1, . . . , Nand m = 1,In. .addition, . , NG , and P and n,i and yn,i denote NBC 𝑁BCparticles is rearranged with 𝑁 consists groups. The n-th 15-bit particlefixed-point of the m-th group at the iteration is and T , respectively. Each of them of two vectors, as i-th shown in Figure 2. defined where n = of 1, …, 𝑁 and = 1, determined …, 𝑁 , and 𝑥by and search 𝑦 , denote 𝑁 for the , = 𝑥bit. , , 𝑦 The , , the The first bit as is a𝑝 sign number integer bitsm was space and 𝑇 , respectively. of them consists of two 15-bit fixed-point vectors, as 15, shown in Figure optimization problem. AsEach the maximum configuration of the search space was according to 2. system The first bit is a sign bit. The number of integer bits was determined by the search space for the requirements, we need at least four bits to represent it. The same number of dummy bits as the integer optimization problem. As the maximum configuration of the search space was 15, according to bits was required for fixed-point multiplication. The number of decimal bits was closely related to the system requirements, we need at least four bits to represent it. The same number of dummy bits as resolution of the optimization In thismultiplication. work, the minimum number of decimals to obtain the integer bits was requiredresults. for fixed-point The number of decimal bits wasbits closely appropriate results was six. related to the resolution of the optimization results. In this work, the minimum number of decimals bits to obtain appropriate results was six. Table 2. The main differences between the original PSO and FRPSO. Table 2. The main differences between the original PSO and FRPSO.

Type Type Implementation Implementation Time step Time step Num. of states Num.Operation of states Parallelizability Operation Parallelizability

PSO PSO On CPU with C/C++ On CPU with C/C++ Logical (OS-depend.) Logical (OS-depend.) Four Generally Four sequential Depends on CPU cores Generally sequential Depends on CPU cores

FRPSO FRPSO On FPGA with HDL On FPGA with HDL Real system clock Real system clock Eight Eight Sequential & Parallel Depends on LUTs Sequential & Parallel Depends on LUTs

Figure 2. 2.The particle definition in FRPSO. Figure Theconcept concept of of fixed-point fixed-point particle definition in FRPSO.

Electronics 2018, 7, 274 x FOR PEER REVIEW

of 15 55of

The concept of particle swarm definition and grouping for parallel processing on FPGAs in Theisconcept swarm definition and grouping for parallel on FPGAs in FRPSO FRPSO shown of in particle Figure 3. The particle swarm at the initial step is processing generally defined as shown in is shown in Figure 3. The particle swarm at the initial step is generally defined as shown in Figure 3a; Figure 3a; the positions of the particles have been sequentially updated as iteration goes, which the positions ofcomputation the particles have sequentially updated as iteration goes, which requires much time. been In FRPSO, the particle swarm at the initial step isrequires definedmuch with computation time. In FRPSO, the particle swarm at the initial step is defined with several groups, several groups, as shown in Figure 3b, and the position update of the particles can be conducted in a as shownparallel in Figure 3b, andasthe position update of the can be conducted in a partially partially manner iteration goes, which canparticles significantly reduce computation time. parallel FRPSO manner as iteration which can significantly PSO reduce computation time. FRPSO is different from is different from thegoes, existing multi-swarm-based variants [16]. The particles within a swarm in the existing multi-swarm-based PSO variants [16]. The particles within a swarm in their work cannot their work cannot move into the areas of other swarms while the particles that belong to different move into the areas ofmove other anywhere swarms while thegiven particles thatspace, belong different groups3c.inIn FRPSO can groups in FRPSO can in the search as to shown in Figure addition, move anywhere in the given search space, as shown in Figure 3c. In addition, FRPSO can be applied to FRPSO can be applied to multi-swarm-based approaches if they need parallelization to reduce multi-swarm-based approaches if they need parallelization to reduce computation time. computation time.

(a) General particle swarm definition at the initial step in PSO

(b) Proposed particle swarm definition at the initial step in FRPSO Figure 3. Cont.

Electronics 2018, 7, 274

6 of 15

Electronics 2018, 7, x FOR PEER REVIEW

6 of 15

(c) Particle swarm according to iterations in FRPSO Figure 3. The The concept concept of of the the particle particle swarm swarm definition definition and and grouping grouping for for parallel parallel processing processing on FPGAs FPGAs in FRPSO.

3.2. FPGA FPGA Structure Structure 3.2. The FPGA FPGA structure structure of of FRPSO FRPSO is is shown shown in in Figure Figure 4, 4, and and the the pseudo-code pseudo-code of of FRPSO FRPSO is is described described The in Table 3. At every rising edge of the clock produced by phase-locked loop (PLL), FRPSO is conducted in Table 3. At every rising edge of the clock produced by phase-locked loop (PLL), FRPSO is with two counters: clock counter to control processesprocesses and the particle conducted with two the counters: the clock(CC) counter (CC) toalgorithm control algorithm and the number particle counter (PNC) to differentiate particles in each particle set. In states with dependency among particles, number counter (PNC) to differentiate particles in each particle set. In states with dependency among as both CC are used, FRPSO performed partially partially in parallel. However, in states in without particles, as and bothPNC CC and PNC are used,isFRPSO is performed in parallel. However, states dependency among particles, as only the CC is used, FRPSO is performed entirely in parallel. The states without dependency among particles, as only the CC is used, FRPSO is performed entirely in parallel. before decision on termination of iterations are conducted with the N groups of particles in parallel. G The states before decision on termination of iterations are conducted with the NG groups of particles Theparallel. multiplication and division are conducted FixedP_Mul andbyFixedP_Div, respectively. The core in The multiplication and divisionbyare conducted FixedP_Mul and FixedP_Div, algorithm of FRPSO is algorithm conductedof inFRPSO PSO_CORE, which consists of more which logic blocks than original respectively. The core is conducted in PSO_CORE, consists ofthe more logic PSO. This is because FRPSO should be implemented with the consideration of avoiding the data loss blocks than the original PSO. This is because FRPSO should be implemented with the consideration caused by clock sharing variable updates among blocks with data dependency. Forwith example, of avoiding the data lossfor caused by clock sharing forlogic variable updates among logic blocks data if a logic block has variables, which may be rewritten for updates in the same clock, it should be dependency. For example, if a logic block has variables, which may be rewritten for updates in the divided by multiple logic blocks and processed independently. same clock, it should be divided by multiple logic blocks and processed independently. Table 3. The pseudo-code pseudo-code of of FRPSO. FRPSO. Table 3. The Algorithm:FPGA-based FPGA-based Real-Time Particle Particle Swarm Algorithm: Real-Time SwarmOptimization Optimization Input: Initial vector, clock (clk_i), reset signal (rst_i), PRNG start signal (start_i) Initial vector, clock (clk_i), reset signal (rst_i), PRNG start signal (start_i) Output: Result vector

Input: Output:

Result vector

1: Start the phase-locked loop block with clk_i and generate clk_p Start PRNG block with 1: 2:Start thethe phase-locked loopstart_i block with clk_i and generate clk_p 3: Initialize the FixedP_Mul and the Fixed_Div blocks with clk_p and rst_i 2: Start the PRNG block with start_i 4: Initialize the signals and variables in the PSO_CORE block with clk_p and rst_i Process the for the PSO_COREand block 3: 5:Initialize FixedP_Mul the Fixed_Div blocks with clk_p and rst_i 6: Initialize the PSO state, clock counter (CC), particle number counter (PNC) 4: Initialize the signals and variables in the PSO_CORE block with clk_p and rst_i 7: Iterate the loop controlled by CC and PNC 5: 8:Process forSample the PSO_CORE blockthe PRNG block from the given search space particles using 9: Truncate the sampled particles for the target format 6: Initialize the PSO state, clock counter (CC), particle number counter (PNC) 10: Calculate the constraints by Equations (5) and (6) 7: 11: IterateCalculate the loop the controlled CC and objectiveby function by PNC Equation (7)

8:

Sample particles using the PRNG block from the given search space

9:

Truncate the sampled particles for the target format

10:

Calculate the constraints by Equations (5) and (6)

11:

Calculate the objective function by Equation (7)

Electronics 2018, 7, 274

7 of 15

Electronics 2018, 7, x FOR PEER REVIEW

12:

7 of 15

Table 3. Cont. Calculate the personal best and the global best

12: Calculate thevelocities personaland bestpositions and the global best by Equations (8) and (9) 13: Update the of particles 13: Update the velocities and positions of particles by Equations (8) and (9) 14: Decide the termination of iterations 14: Decide the termination of iterations 15: Endloop loop 15: End 16: Acquire 16: Acquirethe theoptimal optimalresult resultvector vector 17: End process 17: End process

Figure TheFPGA FPGAstructure structure of method. Figure 4. 4. The of the theproposed proposed method.

3.3. Core Algorithm 3.3. Core Algorithm algorithm FRPSOis isconducted conducted as as follows. follows. First, are are initialized and and The The corecore algorithm of of FRPSO First,NPNparticles initialized P particles randomly sampled by the pseudo random number generator (PRNG) based on the linear feedback randomly sampled by the pseudo random number generator (PRNG) based on the linear feedback shift register (LFSR) within definedsearch search space. space. As an an 8-bit random vector at at shift register (LFSR) within thethe defined AsPRNG PRNGproduces produces 8-bit random vector every clock, the position of each particle is acquired by combining with multiple vectors and every clock, the position of each particle is acquired by combining with multiple vectors and truncated truncated for the target format. Next, the particle evaluation state is conducted by three separate for the target format. Next, the particle evaluation state is conducted by three separate states to avoid states to avoid wrong updates of errors and best particles. The two constraints, (3) and (4), for each wrong of errorsby:and best particles. The two constraints, (3) and (4), for each pn,i can be 𝑝 ,updates can be rewritten rewritten by: , Φm,time = 2𝑦 m + 3𝜎 ) + (𝑡 + 3𝜎 ) + 2𝑇 + 𝑇 (5) , −−𝑥[ x, m 2(𝑡 , Φ = 2yn,i (5) n,i (2( t I + 3σI ) + ( t P + 3σP )) + 2TH + TW ] n,i ,

(1 (−1 𝛼)𝑦 𝑥 ,𝑡 m ΦΦ, m,margin= = (6) − α), y−m (6) n,i − xn,i t MG n,i where 𝑡 and 𝜎 are the mean and standard deviation of 𝑡 , and 𝑡 and 𝜎 are the mean and where t I and σI are the mean and standard deviation of t I , and t P and σP are the mean and standard standard deviation of 𝑡 .

deviation of t P .

Electronics 2018, 7, 274

8 of 15

As the variations of t I and t P are not negligible, their mean and three-sigma standard deviations are used. Then, the objective function is calculated as follows: m,time Ψm ) n,i = ε ( Φn,i

−1

m,margin

+ (1 − ε)Φn,i

(7)

where ε is the weighting factor, which was set to 0.5 in this work. Next, pm n,pb and p gb which are respectively the personal best and the global best are selected to minimize Ψm n,i from the history of the n-th particle of the m-th group and throughout all particles, respectively. Next, the velocity and position of the n-th particle of the m-th group at the i + 1 iteration are respectively updated as follows: m m m m vm n,i +1 = κ [ vn,i + c1 υ ( p gb − pn,i ) + c2 υ ( pn,pb − pn,i )]

(8)

m m pm n,i +1 = pn,i + vn,i +1

(9)

where κ < 1 is a constriction factor which is set to 0.2, and c1 and c2 are control factors for relative attraction to p gb and pm n,pb , which are commonly set to 1.9, and υ is an unit random vector. After the last iteration, the optimal NBC and TBC are finally acquired by p gb at the last iteration. 4. Implementation, Simulations, and Evaluations 4.1. Implementation FRPSO was implemented on Xilinx FPGA Kintex-7 with synthesizable VHDL codes. The simulations and evaluations of the implemented FRPSO were conducted by Xilinx Vivado. The synthesis and implementation with 90 MHz PLL clocks was successfully completed. The utilization and timing results are summarized in Tables 4 and 5, respectively. Table 4. The utilization results of the post-implementation of FRPSO with 200 particles and 20 groups. Resource

Available

Utilization

Utilization (%)

LUT LUTRAM FF DSP IO BUFG PLL

101,400 35,000 2,028,000 600 400 32 8

57,667 3886 30,414 182 49 2 1

56.87 11.10 15.00 30.33 12.25 6.25 12.50

Table 5. The timing results of the post-implementation of FRPSO with 200 particles and 20 groups. Type

Value

Total number of endpoints The number of failing endpoints Worst negative slack (WNS) Total negative slack (TNS) Worst hold slack (WHS) Total hold slack (THS) Worst pulse width slack (WPWS) Total pulse width negative slack (TPWS)

93,504 0 0.33 ns 0 ns 0.023 ns 0 ns 4.788 ns 0 ns

4.2. Simulation Results For simulations, we chose 200 particles, because the minimum number of particles that can consistently show optimal results was 200. It is well known that the more the number of particles, the better will be the optimization performance of the original PSO and its variant, including FRPSO. Therefore, we can expect good results from FRPSO with more number of particles. The number of

Electronics 2018, 7, 274

Electronics 2018, 7, x FOR PEER REVIEW

9 of 15

9 of 15

groups indicates how FRPSO processes are parallelized. The more groups, the more parallelized processes In this this work, work, the the number number of of groups processes and and the the smaller smaller computation computation time. time. In groups was was empirically empirically chosen to properly show that the computation time of FRPSO on FPGAs is much smaller chosen to properly show that the computation time of FRPSO on FPGAs is much smaller than than the the original PSO on CPUs. Obviously, if FRPSO is implemented with groups over 20, the computation original PSO on CPUs. Obviously, if FRPSO is implemented with groups over 20, the computation time time decreases. decreases. The α The whole whole simulation simulation result resultof ofFRPSO FRPSOwith with200 200particles particlesinin2020groups groupswhen whenthe thedesign designmargin margin = 0 is as shown in Figure 5, P and P , respectively, indicate N and T and successfully converge α = 0 is as shown in Figure 5, g1 Pg1 and Pg2g2, respectively, indicate NBC BC and TBC BC and successfully converge the optimalconfiguration configurationofoffixed-point fixed-point binary vectors. px1_psetm px2_psetm represent the the optimal binary vectors. px1_psetm andand px2_psetm represent the twotwo-dimensional positions of the particles in the m-th group, respectively. The whole particle positions dimensional positions of the particles in the m-th group, respectively. The whole particle positions according and the according to to iterations iterations and the final final global global best best position position are are shown shown in in Figure Figure 6. 6. The The particles particles were were successfully converged to the final global best. successfully converged to the final global best.

Figure of FRPSO FRPSO with with 200 200 particles particles in in 20 20 groups groups when when αα == 0. 0. Figure 5. 5. The The simulation simulation result result of

The trajectories shown inin Figure 7. 7. The particles in trajectories of of the theparticle particlepositions positionsinindifferent differentgroups groupsare are shown Figure The particles each group were successfully converged to the final global best. The sampled particles represented in each group were successfully converged to the final global best. The sampled particles by green dots from the uniform distribution on the given search space have moved to the global best position and updated thethe positions of position represented representedby byred redcross crossasasthe theiteration iterationgoes. goes.FRPSO FRPSOprocessed processed and updated positions the particles in different groups in parallel. of the particles in different groups in parallel.

Electronics 2018, 7, 274 Electronics 2018, 7, x FOR PEER REVIEW Electronics 2018, 7, x FOR PEER REVIEW 15 15

10 15 of 15 10 of 10 of 15 sampled positions sampled positions intermediate positions intermediate final positionspositions final positions final global best final global best

10 10

5 5

0 00 0

2 2

4 4

6 6

NBC NBC

8 8

10 10

12 12

T BC T BC

Figure Wholeparticle particle positionsaccording according to iterations and Figure 6. 6. Whole andthe thefinal finalglobal globalbest bestposition. position. Figure 6. Whole particlepositions positions according to to iterations iterations and the final global best position.

T BC T BC

(a) The trajectories of the particles in the 1st group (a) The trajectories of the particles in the 1st group

(b) The trajectories of the particles in the 5th group (b) The trajectories of the particles in the 5th group Figure 7. Cont.

Electronics 2018, 7, 274 Electronics 2018, 7, x FOR PEER REVIEW

11 of 15

T BC

11 of 15

(c) The trajectories of the particles in the 20th group

4.3. Evaluations

Figure 7. The trajectories of the particles in different groups. Figure 7. The trajectories of the particles in different groups.

4.3.The Evaluations evaluations of FRPSO were also conducted by Xilinx Vivado with 90 MHz PLL clocks. The results accordingoftoFRPSO the number particles by used in Vivado FRPSOwith are 90 summarized in Table The evaluations were alsoofconducted Xilinx MHz PLL clocks. The 6, which indicates thattoFRPSO can find optimal NBC TBC are evensummarized though the in number results according the number of the particles used inand FRPSO Table of 6, particles which indicates that200 FRPSO can find the200 optimal NBC the andoptimization TBC even though theofnumber particles varies varies between and 440. Under particles, results FRPSOof were not consistent. between 200 andFRPSO 440. Under the optimization FRPSOdue were Over 440 particles, could200 notparticles, be implemented on Xilinxresults FPGAof Kintex-7 tonot the consistent. lack of LUTs. Over 440 particles, FRPSO be implemented on Xilinximplemented. FPGA Kintex-7 due to the lack of If FPGAs with more LUTs arecould used,not FRPSO will be successfully LUTs. If FPGAs with more LUTs are used, FRPSO will be successfully implemented. Table 6. The optimization results of FRPSO according to the number of particles. Table 6. The optimization results of FRPSO according to the number of particles. Number of Particles NBC (msg) TBC (ms) Optimal Throughput (msg/ms)

Number of Particles 200 200 240 280240 320280 360320 400360 440400 440

NBC (msg) 10 10 10 10 10 10 10 10 10 10 10 10 10 10

TBC (ms) 8 8 8 8 8 8 8 8 8 8 8 8 8 8

Optimal Throughput (msg/ms) 1.25 1.25 1.25 1.251.25 1.251.25 1.251.25 1.251.25 1.251.25 1.25

For a more detailed evaluation, the optimization results of FRPSO with 200 particles according to a moreas detailed optimization results of FRPSO with 200 particles according according to α were For analyzed, shownevaluation, in Figure 8.the FRPSO successfully found optimal configurations to α were analyzed, as shown in Figure 8. FRPSO successfully found optimal configurations α, which indicates that it can be applied to different systems with different requirements. Note that the according to α, which indicates that it can be applied to different with different requirements. optimal throughput with optimal configuration decreases as αsystems increases, which indicates that users Note that the optimal throughput with optimal configuration decreases as α increases, which should carefully design the BC by considering the trade-off relation between the maximum throughput indicates that users should carefully design the BC by considering the trade-off relation between the and transmission margin. maximum throughput and transmission margin.

Electronics 2018, 7, 274 Electronics 2018, 7, x FOR PEER REVIEW

12 of 15 12 of 15

Optimal number of messages (NBC ) 1.2

Optimal period (TBC ) Maximum throughput (NBC /T BC )

1

14 12

0.8 10 0.6

8 6

0.4

4 0.2 2 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

Design margin ( )

Figure 8. 8. Optimization Optimizationresults results by by FRPSO FRPSO according according to to α. α. Figure

To show reduces computational time, time, the processing times oftimes FRPSOofwere compared To showhow howFRPSO FRPSO reduces computational the processing FRPSO were with those of the original PSO and its parallelized version with a GPU (Graphics Processing Unit) compared with those of the original PSO and its parallelized version with a GPU (Graphics according to NP , according as shown to in N Figure 9. The processing time of the original PSO and its parallelized Processing Unit) P, as shown in Figure 9. The processing time of the original PSO and version with the GPU were acquired by a desktop withcomputer 2.3 GHz with CPU 2.3 clocks NVIDIA its parallelized version with the GPU were acquiredcomputer by a desktop GHzand CPU clocks GeForce GTX 750. The parallelized PSO with the GPU was implemented with NVIDIA CUDA Toolkit and NVIDIA GeForce GTX 750. The parallelized PSO with the GPU was implemented with NVIDIA 10.0, which was10.0, mainly conducted on CPU. Obviously, the processing time FRPSO were smaller than CUDA Toolkit which was mainly conducted on CPU. Obviously, theofprocessing time of FRPSO those of both the original PSO and its parallelized version with the GPU with the same optimization were smaller than those of both the original PSO and its parallelized version with the GPU with the results. The increase amounts processing timeof forprocessing FRPSO according to NP were also smaller than same optimization results. The of increase amounts time for FRPSO according to NP were those of the original PSO and its parallelized version with the GPU. This is because FRPSO is designed also smaller than those of the original PSO and its parallelized version with the GPU. This is because to be conducted in parallel, using the resources of FPGAs. Moreover, if the particles isiffixed, FRPSO is designed to be conducted in parallel, using the resources ofnumber FPGAs.ofMoreover, the the computation time of FRPSO is consistenttime because the processing lines on FPGAs are physically number of particles is fixed, the computation of FRPSO is consistent because the processing lines allocated the original PSO is affected communication tasksby onother CPU. on FPGAsand areconnected, physicallywhile allocated and connected, while by theother original PSO is affected Therefore, FRPSO benefits task scheduling real-time systems because of fast computation and communication tasks on CPU. Therefore,inFRPSO benefits task scheduling in real-time time systems independency from other tasks. because of fast computation time and independency from other tasks. The utilization utilizationresults resultsofofFRPSO FRPSOaccording accordingtoto shown in Figure When FRPSO P are The NPNare shown in Figure 10. 10. When FRPSO usedused 440 440 particles, the utilization result of the LUTs, which are the critical hardware resource in FPGA, particles, the utilization result of the LUTs, which are the critical hardware resource in FPGA, was was 93.61%, as described in Figure 10,the and the computation of the proposed was0.6307 about 93.61%, as described in Figure 10, and computation time oftime the proposed methodmethod was about 0.6307 ms, as described in Figure 9. This means that FRPSO is fast enough to be conducted in real ms, as described in Figure 9. This means that FRPSO is fast enough to be conducted in real time, even time, even though critical resources hardware are resources are considerably utilized. The LUT was the main factor though critical hardware considerably utilized. The LUT was the main factor to affect to affect the implementation availability of FRPSO. The LUTRAM and the FF also increase as NP the implementation availability of FRPSO. The LUTRAM and the FF also increase as NP increases, increases, but not theysignificant were not significant affect the implementation of FRPSO. The DSP, but they were to affect thetoimplementation availability availability of FRPSO. The DSP, BUFG, IO, BUFG, IO, and PLL were consistent regardless of N . The utilization results of FPGA resources can be P and PLL were consistent regardless of NP. The utilization results of FPGA resources can be affected affected by definitions, variable definitions, logics. Note that the variable definitions, by variable algorithmalgorithm structures,structures, and logics.and Note that the variable definitions, structure, structure, and logic of FRPSO was designed to be parallelized and applied to real-time optimization by and logic of FRPSO was designed to be parallelized and applied to real-time optimization by reducing computation time with FPGA resources. Because particle positions in FRPSO were defined reducing computation time with FPGA resources. Because particle positions in FRPSO were defined and calculated fix-point vectors, quantization errorserrors occurred betweenbetween the particle and calculatedwith with15-bit 15-bit fix-point vectors, quantization occurred the positions particle in the original PSO and FRPSO; they were evaluated by mean squared errors according to epochs to as positions in the original PSO and FRPSO; they were evaluated by mean squared errors according

epochs as shown in Figure 11, which was not significant. Besides, in this work, since the final results were used as rounded values, small quantization errors did not affect FRPSO’s performance.

Electronics 2018, 7, 274

13 of 15

shown in Figure which not significant. Besides, in this work, since the final results were 13 used Electronics 2018, 7, x11, FOR PEER was REVIEW of 15 Electronics 2018, 7, x FOR PEER REVIEW 13 of 15 as rounded values, small quantization errors did not affect FRPSO’s performance.

Figure 9. The comparison results of the processing times between the original PSO, its parallelized Figure9.9.The Thecomparison comparisonresults resultsofofthe theprocessing processingtimes timesbetween betweenthe theoriginal originalPSO, PSO,its itsparallelized parallelized Figure version with a GPU, and FRPSO. versionwith withaaGPU, GPU,and andFRPSO. FRPSO. version

Figure Figure10. 10. The The comparison comparisonresults resultsof ofthe thecomputation computationtimes timesbetween betweenthe theoriginal originalPSO PSOand andFRPSO. FRPSO. Figure 10. The comparison results of the computation times between the original PSO and FRPSO.

Electronics 2018, x FOR PEER REVIEW Electronics 2018, 7, 7, 274

14ofof1515 14

Figure 11.11.Quantization between particle positions inin the original PSO and FRPSO. Figure Quantizationerrors errors between particle positions the original PSO and FRPSO.

InIn summary, thisthis workwork is focused on the extension of the original PSO to the real-time summary, is focused on the extension of the original PSO to optimization the real-time problem in dataproblem communications with half-duplexwith command-response protocols. The contributions of optimization in data communications half-duplex command-response protocols. The this work are as follows. First, the original PSO on CPU to find the optimal message-chain structure contributions of this work are as follows. First, the original PSO on CPU to find the optimal messagewas hardware-friendly and implemented onand FPGA. Second, since proposed newsince variant chain structure was modified hardware-friendly modified implemented onthe FPGA. Second, the ofproposed the PSO is parallelized with hardware resources on FPGA, its computation time is much smaller new variant of the PSO is parallelized with hardware resources on FPGA, its computation than original PSO. than Third, asoriginal the proposed variant conducted FPGA, it can avoidonconflicts timethe is much smaller the PSO. Third, as is the proposedon variant is conducted FPGA, it with on CPU. In future, further research tofurther generalize FRPSO for data can other avoidtasks conflicts with other taskswe on will CPU.conduct In future, we will conduct research to generalize throughput maximization in other maximization platforms such in as multi-to-multi serialsuch communication systems serial and FRPSO for data throughput other platforms as multi-to-multi wireless communication systems [17–21]. communication systems and wireless communication systems [17–21]. 5.5.Conclusions Conclusions This paper proposed FRPSO, which is a hardware-friendly variant variant of the original to optimize This paper proposed FRPSO, which is a hardware-friendly of thePSO, original PSO, to the message-chain structure in real-time. FRPSO was implemented and conducted on FPGA optimize the message-chain structure in real-time. FRPSO was implemented and conducted on FPGA with synthesizable HDL codes. The simulation and evaluation results showed that FRPSO with synthesizable HDL codes. The simulation and evaluation results showed that FRPSOcan can successfully successfullyoffer offeroptimal optimalconfiguration configurationwith withmuch muchsmaller smallercomputation computationtime timethan thanthe theoriginal originalPSO. PSO. InInaddition, addition,itsitsperformance performancewas wasconsistent consistentininspite spiteofofdifferent differentsystem systemand andalgorithm algorithmparameters. parameters. The implementation availability of FRPSO according to the number of particles The implementation availability of FRPSO according to the number of particleswas wasalso alsoevaluated evaluated and andanalyzed analyzedwith withFPGA FPGAresources. resources.Consequently, Consequently,FRPSO FRPSOisisgood goodfor forreal-time real-timeoptimization optimizationofofthe the message-chain structure in half-duplex command-response data communications. message-chain structure in half-duplex command-response data communications. Author Contributions: All authors contributed the present paper with the same effort in describing the problem, Author Contributions: All authors contributed the present paper with same effort the in describing problem, finding available literatures, and writing the paper. H.L. designed andthe implemented proposed the method to finding resolve theavailable problem.literatures, and writing the paper. H.L. designed and implemented the proposed method to resolve the problem. Funding: This research received no external funding. Funding: research no external funding. Conflicts ofThis Interest: The received authors declare no conflict of interest.

Conflicts of Interest: The authors declare no conflict of interest.

References References

1. MIL-STD-1553B NOTICE II; Department of Defense: Washington, DC, USA, 1986. 2. 1. MIL-STD-1553B Designer’s Data Device Corporation: Bohemia, NY,USA, USA,1986. 1998. MIL-STD-1553B NOTICEGuide; II; Department of Defense: Washington, DC, 2. MIL-STD-1553B Designer’s Guide; Data Device Corporation: Bohemia, NY, USA, 1998. 3. Zhang, J.; Liu, M.; Shi, G.; Pan, W. A MIL-STD-1553B bus command optimization algorithm based on load balance. Appl. Mech. Mater. 2012, 130–134, 3839–3842.

Electronics 2018, 7, 274

3. 4.

5. 6.

7. 8.

9. 10.

11. 12. 13.

14. 15.

16. 17.

18. 19. 20.

21.

15 of 15

Zhang, J.; Liu, M.; Shi, G.; Pan, W. A MIL-STD-1553B bus command optimization algorithm based on load balance. Appl. Mech. Mater. 2012, 130–134, 3839–3842. [CrossRef] Kim, K.P.; Ahn, K.-H.; Kwon, Y.-S.; Yun, S.-J.; Lee, S.-H. Analysis and implementation of high speed data processing technology using multi-message chain and double buffering method with MIL-STD-1553B. J. Korea Inst. Mil. Sci. Technol. 2013, 16, 422–429. [CrossRef] Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the 1995 IEEE International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; pp. 1942–1948. Lee, H.C.; Park, S.K.; Choi, J.S.; Lee, B.H. PSO-FastSLAM: An improved FastSLAM framework using particle swarm optimization. In Proceedings of the IEEE International Conference Systems, Man, and Cybernetics, San Antonio, TX, USA, 11–14 October 2009; pp. 2763–2768. Zhang, J.; Ni, L.; Xie, C.; Tan, Y.; Tang, Z. AMT-PSO: An adaptive magnification transformation based particle swarm optimizer. IEICE Trans. Inf. Syst. 2011, E94-D, 786–797. [CrossRef] Ishikawa, K.; Hayashi, N.; Takai, S. Consensus-based distributed particle swarm optimization with event-triggered communication. IEICE Trans. Fundam. Commun. Comput. Sci. 2018, E101-A, 338–344. [CrossRef] Liu, W.; Chung, I.; Liu, L.; Leng, S.; Cartes, D.A. Real-time particle swarm optimization based current harmonic cancellation. Eng. Appl. Artif. Intell. 2011, 24, 132–141. [CrossRef] Gao, Z.; Zeng, X.; Wang, J.; Liu, J. FPGA implementation of adaptive IIR filters with particle swarm optimization algorithm. In Proceedings of the ICCS 2008. 11th IEEE Singapore International Conference on Communication Systems, Guangzhou, China, 19–21 November 2008; pp. 1364–1367. Gupta, L.; Mehra, R. Modified PSO based Adaptive IIR Filter Design for System Identification on FPGA. Int. J. Comput. Appl. 2011, 22, 1–7. [CrossRef] Vasumathi, B.; Moorthi, S. Implementation of hybrid ANN–PSO algorithm on FPGA for harmonic estimation. Eng. Appl. Artif. Intell. 2012, 25, 476–483. [CrossRef] Morsi, N.N.; Abdelhalim, M.B.; Shehata, K.A. FPGA Implementation of PSO-Based Object Tracking System Using SSIM. In Proceedings of the International Conference on Micro-electronics, Beirut, Lebanon, 15–18 December 2013; pp. 1–4. Trimeche, A.; Sakly, A.; Mtibaa, A. Implementation of PSO algorithm for MIMO detection system in FPGA. Int. J. Electron. 2018, 105, 42–57. [CrossRef] Lee, H.C.; Kim, K.P.; Kwon, Y.S. Efficient and reliable bus controller operation with particle swarm optimization in MIL-STD-1553B. In Proceedings of the Asia-Pacific International Sym. Aerospace Technology, Seoul, Korea, 16–18 October 2017; pp. 1433–1440. Blackwell, T.; Branke, J. Multi-swarm optimization in dynamic environments. In Proceedings of the Workshops on Applications of Evolutionary Computing, Coimbra, Portugal, 5–7 April 2004; pp. 489–500. Vamvakas, P.; Tsiropoulou, E.E.; Papavassiliou, S. A user-centric economic-driven paradigm for rate allocation in non-orthogonal multiple access wireless systems. EURASIP J. Wirel. Commun. Netw. 2018, 129, 1–14. [CrossRef] Tan, C.W.; Chiang, M.; Srikant, R. Fast algorithms and performance bounds for sum rate maximization in wireless networks. IEEE/ACM Trans. Netw. 2013, 21, 706–719. [CrossRef] Xu, Y.; Hu, Y.; Li, G. Robust rate maximization for heterogeneous wireless networks under channel uncertainties. Sensors 2018, 18, 639. [CrossRef] [PubMed] Tsiropoulou, E.E.; Vamvakas, P.; Papavassiliou, S. Joint utility-based uplink power and rate allocation in wireless networks: A non-cooperative game theoretic framework. Phys. Commun. 2013, 9, 299–307. [CrossRef] Bi, S.; Zhang, Y.J. Computation rate maximization for wireless powered mobile-edge computing with binary computation offloading. IEEE Trans. Wirel. Commun. 2018, 17, 4177–4190. [CrossRef] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Suggest Documents