Clock Gated Low Power Memory Implementation on ...

33 downloads 5061 Views 3MB Size Report
clock, signal, BRAM, IOs and leakage power consumption are. 108mW, 58mW ...... FLASH or EEPROM, COS through the smart card file system management ...
Clock Gated Low Power Memory Implementation on Virtex-6 FPGA Bishwajeet Pandey, Deepa Singh, Deepak Baghel, Jyotsana Yadav, Manisha Pattanaik Department of Information Technology Indian Institute of Information Technology, Gwalior Gwalior, India [email protected] Abstract— In this work, Virtex-6 is Target 40nm FPGA Device. Xilinx ISE 14.1 is an ISE Design tool. RAM is a target design. Clock Gating is a technique which decreases clock power but increases Logic Power due to added Logic in Design. Irrespective of increase in number of Signal and IO buffer due to Clock Gating, there is significant decrease in IO Power and Dynamic Power due to decrease in number of frequency of device operating. The increase in Logic Power and Signal Power is relatively small in magnitude than decrease in clock power that translates to decrease in overall dynamic power. The clock power consumption of Clock Gated 65536x16-bit dual-port RAM is 38.89%(on 1GHz) and 41.3%(on 10GHz) lesser than the clock power consumption of 65536x16-bit dual-port RAM without using clock gating Techniques.

Power, Signal Power, BRAM Power and IOs Power. Clock gating is power-saving techniques which has used on the Pentium 4 processor [2]. The clock gating concept isn't a new concept; it is gaining importance after the design of Pentium4 processor.

Keywords- Dual Port RAM, Clock Gating, Logic Power, Signal Power, Clock Power, Dynamic Current, Dynamic Power

I.

INTRODUCTION

In order to design, synthesize and implement low power memory on FPGA, we are taking 65536x16-bit RAM as shown in Figure1. That store 16-bit data on 16-bit address. That perform read and write depending on value of write enable (we). The enable (en) signal is used to implement clock gating in RAM.

Figure 2: RTL Schematics of Clock Gated Low Power RAM

Clock Gating is a technique which decreases clock power but increases Logic Power due to added Logic in Design. Irrespective of increase in number of Signal and IO buffer due to Clock Gating, there is significant decrease in IO Power and Signal Power due to decrease in number of frequency of device operating. The increase in Logic and Signal power is relatively small in magnitude than decrease in clock power that translates to decrease in overall dynamic power. II.

Figure 1: Clock Gated Low Power RAM

We know that, Clock gating is a low power technique, which is effective in reduction of dynamic power. Dynamic power is a power consumed by device when device is in onstate. Dynamic power consists of Clock Power, Leakage

LITERATURE REVIEW

Reference [1] describes clock gating and similar techniques of clock gating to turn off inactive module of a system and switch between active and inactive modules. The main ideas discussed in [1] are: clock gating, clock enable, and blocking inputs. After implementation of these techniques on different type of multiplier in [1], Clock Enable is Power hungry techniques and Clock Gating is energy efficient technique. Therefore, we are using Clock Gating technique for our memory design and implementation on FPGA. According to Reference [2], clock gating refers to disable the clocks of

inactive logic module. Each and Every unit of chip has a power reduction scheme, and every Functional Unit Block (FUB) is clock gated. The work in [2] investigates the various clock gating techniques that is used to optimize power consumption in circuits at register transfer level and deals with different factor involved while applying this power optimization techniques at RTL level. In our low memory design, we are using the efficient one which is discussed in [2]. Result of [3] indicates that increase in leakage power is proportional to the extra logic used for CG. Therefore, CG is not expected to reduce leakage power. In [4], a set of energy efficient logical-to-physical RAM mapping algorithms is discussed, to convert user memory specifications to on chip BRAM and LUTRAM like FPGA memory block resources. Power saving algorithms minimizes dynamic power consumption of RAM by taking the most power-efficient choice. In this design, we are using techniques of [4] along with clock gating for memory block mappings and reduce power consumption of our design. In [5], synthesis is possible with integration of data reuse, memory partitioning, loop pipelining, and memory merging into an automated optimization flow (AMO). According to reference [5], memory merging saves up to 44.32% of block RAM (BRAM). In this design of low power memory, we use intelligent memory merging which is automatically implemented by Xilinx synthesis Technology. In [6], when any one of module of 16 modules of ALU execute because of clock gating rest 15 modules switched off and reduce (15/16)*100=93.75% power. In this design we are using clock gating for same purpose of low power RAM design and achieving 38.83% and 41.3% power reduction when our design operate respectively on 1GHz and 10GHz. III.

As Shown in Figure: 3, there are 4 inputs. First Input is 16-bit wide address. Second input is 16-bit Data. Clock (Clk) and we (write enable) are third and fourth inputs in RAM. Total CPU time to XST completion of this target design is 10.66 sec. Total memory usage of this target design is 243032 kilobytes. B. RTL Scheamtic of 65536x16-bit dual-port RAM The RAM will be implemented as a BLOCK RAM, absorbing the following register(s) : . The aspect ratio of RAM is 65536-word x 16-bit. The mode of RAM is Write-First.

Figure 4: RTL Schematic of 65536x16-bit dual-port RAM

This RTL schematic is based on native generic register (NGR) file of RAM. It is generated by Xilinx ISE14. C. Technology Schematic of 65536x16-bit dual-port RAM

RESULTS WITHOUT CLOCK GATE

A. Top Level Scheamtic of 65536x16-bit RAM

Figure 5: Technology Schematic of 65536x16-bit RAM

Technology schematic is based on native generic circuit (NGC) file. It is independent of the clock gating techniques. Slice Logic Distribution is same with clock gate or without clock gate. It has one ground (GND), one voltage supply (VCC), fifty bonded IOBs, 32 block RAM and one global clock buffer. It has also 33 input buffer and 16 output buffers.

Figure 3: 65536x16-bit RAM w/o Clock Gate

Slice Logic Distribution Number of LUT 0 Number of GND 1 Number of VCC 1 Number of bonded IOBs: 50 out of 210 Number of Block RAM/FIFO 32 out of 135 Number of Global Clock Buffer 1 out of 32 Number of Input Buffer 33 Number of Output Buffer 16

23% 23% 3%

Vccaux Vcco18 Vccbra m

1.8 1.8 1.0

0.047A 0.271A 0.523A

0.016A 0.001A 0.002A

0.031A 0.270A 0.521A

The current flow through Vccint, Vccaux, Vcc018, Vccbram are 7.495A, 0.047A, 0.271A and 0.523A respectively on 10 GHz device operating frequency and without clock gating. IV.

RESULTS WITH CLOCK GATE

Technology schematic and RTL schematic of our design RAM is almost same with or without clock gating techniques. Power Consumption and Current Flow change with or without clock gating techniques. Figure 6: 36 KB Block RAM (RAMB36E1)

A. Power Consumption of 65536x16-bit dual-port RAM For cascadable block RAM using the RAMB36E1, the data width is one bit, and the address bus is 16 bits [15:0]. The address bit 15 is merely applied in cascadable block RAM. D. Power Consumption of 65536x16-bit dual-port RAM Table 1: Power Consumption on 1 GHz Device Operating Frequency Device Operate On 1 GHz Clock Frequency or 1ns Clock Period Clock Signal BRAM IOs Leakage 0.018W 0.006W 0.776W 0.059W 0.045W

Without using clock gating and operate this memory on 1 GHz speed, the clock, signal, BRAM, IOs and Leakage Power consumption are 18mW, 6mW, 776mW, 59mW and 45mw respectively as shown in Table:1. E. Current Consumption of 65536x16-bit dual-port RAM Device Operate On 1 GHz Clock Frequency or 1ns Clock Period V Total Current Leakage Current Dynamic Current Vccint 1.0 0.773A 0.019A 0.754A Vccaux 1.8 0.016A 0.013A 0.003A Vcco18 1.8 0.028A 0.001A 0.027A Vccbram 1.0 0.053A 0.001A 0.052A

Without using clock gating and operate this memory on 1 GHz speed, the current flow through Vccint, Vccaux, Vcc018, Vccbram are 773mA, 16mA, 28mA and 53mA respectively F. Power Consumption of 65536x16-bit dual-port RAM Table 2: Power Consumption on 10 GHz Device Operating Frequency Device Operate On 10 GHz Clock Frequency or 0.1ns Clock Period Clock Signal BRAM IOs Leakage 0.184W 0.065W 7.758W 0.593W 0.085W

Without clock gating and 10 GHz device operating frequency, the clock, signal, BRAM, IOs and Leakage Power consumption are 184mW, 65mW, 7758mW, 593mW and 85mw respectively as shown in Table:2. G. Current Consumption of 65536x16-bit dual-port RAM Device Operate On 10 GHz Clock Frequency or 0.1ns Clock Period V Total Current Leakage Current Dynamic Current Vccint 1.0 7.495A 0.052A 7.443A

Table 3: Power Consumption on 1 GHz Frequency Device Operate On 1 GHz Clock Frequency or 1ns Clock Period Clock Signal BRAM IOs Leakage 0.011W 0.006W 0.776W 0.058W 0.045W

With using clock gating and operate this memory on 1 GHz speed, the clock, signal, BRAM, IOs and Leakage power consumption are 11mW, 6mW, 776mW, 58mW and 45mw respectively as shown in Table:3. B. Current Consumption of 65536x16-bit dual-port RAM Device Operate On 1 GHz Clock Frequency or 1ns Clock Period Total Current Leakage Current Dynamic Current Vccint 1.0 0.763A 0.019A 0.744A Vccaux 1.8 0.016A 0.013A 0.003A Vcco18 1.8 0.028A 0.001A 0.027A Vccbram 1.0 0.053A 0.001A 0.052A

With clock gating and 1 GHz device operating frequency, the current flow through Vccint, Vccaux, Vcc018, Vccbram are 763mA, 16mA, 28mA and 53mA respectively. VCCINT is core supply input voltage. VCCAUX is 2.5V for JTAG port and configuration pin. VCCO is 1.8 V related to IO standard. C. Power Consumption of 65536x16-bit dual-port RAM Table 4: Power Consumption on 10 GHz Frequency Device Operate On 10 GHz Clock Frequency or 0.1ns Clock Period Clock Signal BRAM IOs Leakage 0.108W 0.058W 7.758W 0.581W 0.084W

With clock gating and 10 GHz device operating frequency, the clock, signal, BRAM, IOs and leakage power consumption are 108mW, 58mW, 7758mW, 581mW and 84mw respectively as shown in Table:4. D. Current Consumption of 65536x16-bit dual-port RAM Device Operate On 10 GHz Clock Frequency or 0.1ns Clock Period V Total Current Leakage Current Dynamic Current Vccint 1.0 7.590A 0.052A 7.538A Vccaux 1.8 0.047A 0.016A 0.031A Vcco18 1.8 0.271A 0.001A 0.270A Vccbram 1.0 0.523A 0.002A 0.521A

With using clock gating and operate this memory on 10 GHz speed, the current flow through Vccint, Vccaux, Vcc018, Vccbram are 7.590A, 0.047A, 0.271A and 0.523A respectively.

V. W/o Clock Gate With Clock Gate

Clock 0.018W 0.011W

References

CONCLUSION Signal 0.006W 0.006W

BRAM 0.776W 0.776W

IOs 0.059W 0.058W

The clock power consumption of Clock Gated 65536x16-bit dual-port RAM is 38.89%(on 1GHz) lesser than the clock power consumption of 65536x16-bit dual-port RAM without using clock gating Techniques. W/o Clock Gate With Clock Gate

Clock 184mW 108mW

Signal 65mW 58mW

BRAM 7758mW 7758mW

IOs 593mW 581mW

The clock power consumption of Clock Gated 65536x16-bit dual-port RAM is 41.3%(on 10GHz) in compare to the clock power consumption of 65536x16-bit dual-port RAM without using clock gating Techniques. Device Operate On 1 GHz Clock Frequency or 1ns Clock Period Total Current Leakage Dynamic Current Current With Clock Gate 0.773A 0.019A 0.754A W/o Clock Gate 0.763A 0.019A 0.744A

The current flow of Clock Gated 65536x16-bit dual-port RAM is 1.3%(on 1GHz) lesser than the clock power consumption of 65536x16-bit dual-port RAM without using clock gating Techniques. Device Operate On 10 GHz Clock Frequency or 0.1ns Clock Period Total Current With Clock Gate

7.495A

Leakage Current 0.052A

W/o Clock Gate

7.590A

0.052A

Dynamic Current 7.443A 7.538A

The current flow of Clock Gated 65536x16-bit dual-port RAM is 1.25%(on 10GHz) lesser than the clock power consumption of 65536x16-bit dual-port RAM without using clock gating. VI.

This work is a result of the research environment of ABVIIITM Gwalior under the guidance of our director Prof. S.G. Deshmukh. Who is source of continuous inspiration and motivation.

FUTURE SCOPE

This memory implementation is made on 40-nm Virtex-6 FPGA. We can implement this design on 28-nm Virtex-7, Kintex-7 and other latest FPGA to verify energy efficiency of this design. Virtex-6 FPGA provides 66MHz system clock, which is lower than the frequency under consideration for this design. The latest FPGA, which will provide higher frequency, is required for this design. Using clock gating with other power saving techniques, we can enhance the current 38.89% (on 1GHz) and 41.3% (on 10GHz) power saving during implementation of this memory on FPGA.

Acknowledgment

J. P. Oliver, J. Curto, D. Bouvier, M. Ramos, and E. Boemo, “Clock gating and clock enable for FPGA power reduction”, 8th Southern Conference on Programmable Logic (SPL), pp. 1-5, 2012. [2] J. Shinde, and S. S. Salankar, “Clock gating-A power optimizing technique for VLSI circuits”, Annual IEEE India Conference (INDICON), pp. 1-4, 2011. [3] J. Castro, P. Parra, and A. J. Acosta, “Optimization of clock-gating structures for low-leakage high-performance applications”, Proceedings of IEEE International Symposium on Efficient Embedded Computing, pp. 3220-3223, 2010. [4] R. Tessier; V. Betz; D. Neto; A. Egier; T. Gopalsamy, “Power-Efficient RAM Mapping Algorithms for FPGA Embedded Memory Blocks”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol.26, 2007. [5] Y. Wang; P. Zhang; X. Cheng; J. Cong, “An integrated and automated memory optimization flow for FPGA behavioral synthesis”, 17th Asia and South Pacific Design Automation Conference (ASP-DAC), 2012. [6] B. Pandey; M. Pattanaik, “Clock Gating Aware Low Power ALU Design and Implementation on FPGA”, International Journal of Future Computer and Communication (IJFCC), Vol.3, ISSN: 2010-3751, 2013. (In Press) [7] D. Kumar, P. Kumar, M. Pattanaik, ”Performance analysis of 90nm Look Up Table(LUT) for Low Power Applications”, 13th Euromicro Conference On Digital System Design Architectures, Methods and Tools , Lille, France, 1-3 September, 2010. [8] S. Ortega-Cisneros; J.J. Raygoza-Panduro; J. Suardiaz Muro; E. Boemo, ”Rapid prototyping of a self-timed ALU with FPGAs” International Conference on Reconfigurable Computing and FPGAs,pp. 26-33, 2012 [9] S. Birla, N. K. Shukla, K. Rathi, R. K. Singh, M. Pattanaik, ”Analysis of 8T SRAM Cell at Various Process Corners at 65nm Process Technology”, Circuit& Systems, USA, Vol. 2, No. 4, pp. 326-329, Oct. 2011. [10] Bishwajeet Pandey, Deepa Singh, Manisha Pattanaik, “Low Power Design of Random Access Memory and Implementation on FPGA”, Journal of Automation and Control Engineering, ISSN:2301-3702 [1]

Performance Evaluation Of Backoff Method – Effect Of Backoff Factor On Exponential Backoff Algorithm Deepa Singh, Bishwajeet Pandey ABV-Indian Institute of Information Technology and anagement, Gwalior

I.

Department of CSE, Technocrats Institute of Technology, Bhopal

Machine Intelligence Research Labs, Gwalior [email protected]

Abstract— In this paper the effect of back off factor on exponential algorithm is analyzed and binary exponential algorithm is implemented in Matlab. Binary Exponential Algorithm is widely used as a network congestion avoidance or collision resolution protocol. The detailed analysis of saturation throughput is done in this work. This work also covers packet’s medium access delay for a given number of node N. Binary exponential is a special case of exponential backoff when r =2, where r is a backoff factor and we basically analyse the effect of backoff factor r =1.1,1.2,1.3, 1.4,1.5,1.6,1.7,1.8,1.9,2.0 on exponential backoff algorithm. All the result is obtained through matlab (Matrix Laboratory) simulation language. Keywords—MAC Layer, Binary Matlab, Back Off Method, Matlab

B. K. Sarkar

G S Tomar

Exponential

Algorithm,

INTRODUCTION

Data link layer is divided into two sub layer. One is Logical Link Control (LLC) Layer and other is Medium Access Control (MAC) layer. The protocol used to determine who goes next on multi-access channel belongs to the MAC (medium access control) layer. The MAC sublayer is essential important in LANs because the physical address of computer is also called MAC address. Technically MAC sublayer is the bottom part of the data link layer. A binary exponential backoff algorithm is a widely used collision resolution protocol. We analyses the back off factor in [1-2] range of 0.1 step size, where r is a back off factor. We take back off factor r=1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 and 2. If r=2, then Exponential back off is called binary exponential back off algorithm. Exponential Back off (EB) has maximum retry limit M that means a packet is dropped after M transmission attempt. In Binary Exponential Backoff (BEB), after i consecutive packet transmission failure a node selects a single random slot from next 2i slots (contention window) with equal probability for the next transmission. On other hand, in the modified BEB, after collisions, the probability of packet transmission in each slot is 2 -i until the transmission occur, which can happen after 2i slots. Exponential Back off Algorithm: Exponential Back off Algorithm is widely used collision resolution protocol, which is used to space out repeated retransmission of the same block of data. In Ethernet this algorithm is commonly used to

schedule retransmission after collision. Back-off Factor: If the channel is under congestion, then the node waits for a random period of time and again sense channel to see the channel is clear. This period of time is called “Back-off Factor”. Retry Limit: Retry Limit is a total number of retransmissions of a frame possible after collision. Collision: When two stations sends packet at the same time using common channel collision occurs. Adapter senses collision based on the potential difference. II.

LITERATURE SURVEY

According to Reference [1], the current works on network performance is proportional with offer load and with different exponential backoff algorithm that also includes binary exponential algorithms. Majority of works in exponential backoff algorithm is related to the stability of EB, and there is a research gap for the performance analysis of EB. In different cases, these studies have produced opposite results in term of stability of EB instead of a common conclusion: one proves instability; others prove stability under certain conditions. Differences between analytical models, either simplified or modified models of the exponential backoff algorithm are used, and the definition of stability used in the analysis. Simplification and modification of models of the exponential backoff algorithm are thrust area for network researcher to make traceable analysis, but that always create difference in results. For example, reference [3] proved instability of Binary Exponential Backoff for an infinite-node model when arrival rate is greater than equal to zero while reference [4] proved stability of Binary Exponential Backoff for smaller arrival rates using a modified finite-node model. When modification is made for simpler model, the analytical result may have limited importance because it is not sure that the modified model exhibits the same behavior of the original algorithm. Definitions of stability used in the studies of backoff algorithms are organized into two groups. First approach is using a throughput definition and the Second is using stability definition in term of delay. In the throughput definition, the stability of algorithm is dependent, if the throughput does not tends to zero when the offered load goes to infinity [3] or

throughput is directly proportional of the offered load [5]. In the delay definition, the protocol is stable if and only if the waiting time is in range. Systems that are stable under the delay definition can be characterized by either a bounded backlog of packets in the queue or the recurrent property of Markov chains [6]. Most of the analytical and simulation studies on Exponential Backoff algorithm treats the backoff algorithm in the context of a specific network medium access control (MAC) protocol such as Ethernet [7]–[11] or WLAN [12]. The characteristics of the specific protocol deemed to have much effect on the network performance results due to the intrinsic characteristics of EB. Therefore, the results depend on which MAC protocol is used in the study. Some of the analytical works that focus on EB itself are summarized as follows:

backoff (BEB) algorithm is the special case with r=2. The validity of the analysis is verified against simulation results provided by Matlab. III.

RESULTS

The result is simulated on Matlab tool to verify the functionality of Binary Exponential Algorithm. A. The Relation Between Pc And Pt With Backoff Factor plot of Pt as a function of Pc 0.4

0.35

0.3

pt

0.25

0.2

r=1.0 r=1.1 r=1.2 r=1.3 r=1.4 r=1.5 r=1.6 r=1.7 r=1.8 r=1.9 r=2

0.15

0.1

0.05

0

0

0.05

0.1

0.15

0.2

0.25 pc

0.3

0.35

0.4

0.45

0.5

Figure 1: Relation between Pc and Pt with Backoff Factor

The Figure1 shows Plot of Pt as a function of Pc shows variations in Pt. If the value of Pc changes then the value of Pt also changes i.e., when Pc value increases then Pt value decreases. It is clear from the figure that the graph shows variations in Pt which decreases exponentially with increase in Pc and the various colors show the value of Different r (Backoff Factor). B. The Relation Between Pc And Pt With Timeout plot of Pt as a function of Timeout 50

0

-50

pt

Reference [13] observe that “for a general acknowledgment based random access scheme” there is a critical value Vc in the range of [0,∞] with the property that the number of packets successfully transmitted is finite with probability 0 when V < Vc otherwise 1. It is also proved that Vc = 0 for any scheme which is slower than exponential backoff, and Vc = log 2 for BEB. Both uses an infinite-node model with Poisson arrivals, assuming that no node has more than 1 packet arrive at it. This result shown that Binary Exponential backoff is unstable for V > Vc, but stability is uncertain for V < Vc. The Aldous model is different from Kelly and McPhee’s model. Reference [4] show the stability of Binary Exponential Backoff is stable if the Arrival rate is smaller than, where for some constant and is the number of nodes. They assume that each of the finite number of nodes has a queue of infinite Capacity. Reference [14], give a greater upper bound of the arrival rate than that given in [4] for the stability of BEB under the delay definition of stability. The upper bound in [14] is improved in [15], where Binary Exponential Backoff is stable for arrival rate smaller and the main point of current work on BEB is that BEB is stable for an arrival rate that is the inverse of a sub linear polynomial in n. Finally, reference [6] uses the same analytical model of [4], that BEB is unstable whenever λi ≥ λn for 1≤ i ≤ n and λ >0.567+1/(4n-2), or when λ>0.5 and n is sufficiently large under the delay definition of stability, where λ is the system arrival rate and λi is the arrival rate at node. In summary, Binary Exponential Backoff is unstable for an infinite-node model, and it is stable for a finite-node model. In the other hands, it is stable if the system arrival rate is small but BEB is unstable if the arrival rate is too large. Therefore, the question regarding stability of BEB continues to be an open problem. As noted in [6] and [16], the infinite-node model used in [13] and [3] is a mathematical abstraction model with limited practical application. B-Kwak and Nah– Oak show new analytical results for the performance of the EB algorithm. Most studies on EB focuses on the stability of the algorithm and little attention has been paid to the performance analysis of EB. In this paper, we analyze EB and obtain saturation throughput and medium access delay of a packet for n number of nodes. This matlab analysis considers the general case of EB with backoff factor. Binary exponential

-100

r=1.0 r=1.1 r=1.2 r=1.3 r=1.4 r=1.5 r=1.6 r=1.7 r=1.8 r=1.9 r=2

-150

-200

0

10

20

30

40

50 Timeout

60

70

80

90

100

Figure 2: Relation between Pc and Pt with Timeout

The Figure:2 shows plot of Pt as a function of Timeout to display the decrease in exponential result. When Timeout is increased linearly the value of Pt decreases exponentially and there is a shift in Y-axis with consideration of Timeout.

C. Relation Between Pc and Pt with Window Size 10

E. Showing The Relation Between Pc And Pt Factor And Number Of Node

With Backoff

Plots of probability of collision Pc and probability of transmission

plot of Pt as a function of Pc with Window Size10

1

0.2

0.9

0.18

0.8

0.16

0.7

0.14 0.6

pt

pt&pc

0.12 0.5

0.1 0.4

0.08

r=1.0 r=1.1 r=1.2 r=1.3 r=1.4 r=1.5 r=1.6 r=1.7 r=1.8 r=1.9 r=2

0.06

0.04

0.02

0

0

r=1.0 r=1.1'r=1.2 r=1.3 r=1.4 r==1.5 r=1.6 r=1.7 r=1.8 r=1.9 r=2

0.3

0.2

0.1

0

0.05

0.1

0.15

0.2

0.25 pc

0.3

0.35

0.4

0.45

0

5

10

15

20

25 number of node N

30

35

40

45

50

0.5

Figure 5: Pc and Pt with Backoff Factor and Number of Node

Figure 3: Relation between Pc and Pt with Window Size 10

Here the Figure3 showing Plot of Pt as a function of Pc with respect to window size 10. D. Relation Between Pc And Pt With Number Of Node

It shows the packet drop and throughput of the network when No. of nodes are considered. The graph shows even if No. of nodes increases the packet transmission capability is almost constant and here we have considered up to 50 nodes and have not considered the effect beyond this size. F. Showing the relation between Pc and Pt with Backoff Factor and Number of node with Timeout

plot between number of node N and Pc 0.16 N=5 N=10 N=20 N=30 N=40 N=50 N=60

0.14 0.12 0.1

Plots of probability of collision Pc and probability of transmission With Time Out 0.2

0.18

0.16

pt

0.14

0.08 0.12 pt&pc

0.06 0.04

0.08

0.02 0

0.1

r=1.0 r=1.1 r=1.2 r=1.3 r=1.4 r=1.5 r=1.6 r=1.7 r=1.8 r=1.9 r=2

0.06

0.04

0

0.05

0.1

0.15

0.2

0.25 pc

0.3

0.35

0.4

0.45

0.5 0.02

Figure 4: Relation between Pc and Pt with Number of Node 0

The graph shows the plot between No. of node and Pc. It depicts that when the No. of nodes increases then the value of Pc increases and after a little increase it becomes constant and the value of Pt decreases. At N=5, Pt increases very fast. If the No. of nodes are less than the value of Pt will increase. Suppose if we increase two No. of nodes then Pt will decrease but will decrease slowly and if we increases more No. of Nodes i.e., at N=10 then Pt will decrease at a fast rate.

0

5

10

15

20

25 number of node N

30

35

40

45

50

Figure 6: Pc and Pt with Backoff Factor and Number of Node with Timeout

The plot between Pc and Pt with Backoff factor is dependent on number of Node and Timeout. This is considered with fixed Timeout which is almost identical but has improved the performance as compared to the previous result.

G. Probability of Success with Node Count and Retry Limit

I.

Medium Access Delay with Node Count and Backoff Factor

The probability of successful transmission in a slot (normalized saturation throughput)

Medium access delay in the number of time slots,r=2

0.45

250

0.4

200

0.35

r=1.0 r=1.1 r=1.2 r=1.3 r=1.4 r=1.5 r=1.6 r=1.7 r=1.8 r=1.9 r=2

150

0.25

0.2

r=1.0 r=1.1 r=1.2 r=1.3 r=1.4 r=1.5 r=1.6 r=1.7 r=1.8 r=1.9 r=2

0.15

0.1

0.05

0

delay(#slots)

p-success

0.3

0

5

100

50

10

15

20

25 number of node N

30

35

40

45

50

0

0

5

10

15

20

25 number of node N

30

35

40

45

50

Figure 7: Probability of Success with Number of Node and Retry Limit Figure 9: Medium Access Delay with Number of Node and Backoff Factor

Even considering the retry limits for various No. of Nodes the performance has not shown any change in Figure7 thus justifying work. H. Probability of Success with Node Count and Retry Limit with timeout

If the No. of nodes increases the delay linearly increases and at r=1.2 the delay increases at a fast rate. The increase in delay is shown in Figure9 for different Backoff factors. J.

Medium Access Delay with Node Count And Backoff Factor with Timeout Medium access delay in the number of time slots,r=2 With Timeout

The probability of successful transmission in a slot (normalized saturation throughput) With Timeout 0.12

400

0.115

350

0.11

r=1.0 r=1.1 r=1.2 r=1.3 r=1.4 r=1.5 r=1.6 r=1.7 r=1.8 r=1.9 r=2

0.1

0.095

300

delay(#slots)

p-success

0.105

250 200 150

0.09

100

0.085

50

0.08

0

5

10

15

20

25 number of node N

30

35

40

r=1.0 r=1.1 r=1.2 r=1.3 r=1.4 r=1.5 r=1.6 r=1.7 r=1.8 r=1.9 r=2

45

50

0

0

5

10

15

20 25 30 number of node N

35

40

45

50

Figure 8: Probability of Success with Number of Node and Retry Limit with timeout

Figure 10: Medium Access Delay with Number of Node and Backoff Factor with Timeout

In case of probability of successful transmission with Timeout as No. of nodes are increasing the success rate is constant after falling till 15 nodes as shown in Figure: 8.

Considering Timeout if the No. of Nodes increases the delay is almost linear as shown in Figure: 10 but will affect the Throughput of the Network.

IV.

CONCLUSION

On the basis of analytical study, investigation of various factors associated with the Exponential Backoff, the results achieved in this work are quite encouraging. It is clear from the results depicted with the help of various graphs between various performance parameters; the proposed work has outperformed various existing works. It is also evident that the Throughput is dependent on various considered parameters. Thus it is sometimes required to compromise using certain tradeoffs. We analyze the medium access delay with different values of backoff factor. The results indicate the BEB-M also bounds the medium access delay. These benefits are accomplished by limiting the number of transmission retries for a packet and thus giving a chance of transmission to the next packet waiting. The present work has been justified by considering various results achieved using MATLAB Simulation and have given boost for avoiding packet drops and unnecessary delay in the network. V.

FUTURE SCOPE

In this work some results show that there is a scope to further improvement in future and variation in window size along with variable Backoff factor and delay/latency consideration can give better results and Throughput if work is taken up in future. There is open scope to change the window status then new result may arise and performance may further enhance by introducing pause collision may be less and by using different backoff factors along with change in Timeout conditions the result may prove to be better.

[1]

[2]

[3] [4]

[5]

[6]

[7]

[8]

[9]

[10] [11]

[12]

Acknowledgment [13]

THANKS AND APPRECIATION TO THE HELPFUL PEOPLE AT ABV-IIITM AND TIT BHOPAL AND MY FACULTY

[14]

MEMBERS WHO WERE SOURCE OF CONTINUOUS INSPIRATION AND MOTIVATION FOR ME. I ALSO OFFER MY GRATITUDE TO MY FAMILY AND WELL-WISHERS.

References

[15] [16]

Nah-Oak Song, Byung-Jae Kwak, and Leonard E. Miller,”On the Stability of Exponential Backoff” Journal of Research of the National Institute of Standards and Technology, Volume 108, pp- 289-297, Number 4, July-August 2003 K. Sakakibara, T. Seto, D. Yoshimura, and J. Yamakita, “On the stability of slotted ALOHA systems with exponential backoff and retransmission cutoff in slow-frequency-hopping channels,” in Proc. 4th Int. Symp. Wireless Personal Multimedia Communications, Aalborg, Denmark, Sep. 2001. “An improved stability bound for binary exponential backoff,” Theory Comput. Syst., vol. 30, pp. 229–244, 2001. W. Yue and Y. Matsumoto, “An Exact Analysis for CSMA/CA Protocol in Integrated Voice/Data Wireless LANs”, in Proc. IEEE Globecom ’00,December 2000. F. Cali, M. Conti, and E. Gregori, “IEEE 802.11 Protocol: Design and Performance Evaluation of an Adaptive Backoff Mechanism,” IEEE Journal on Selected Areas in Communications, vol. 18, no.9, pp. 17741786, September 2000. G. Bianchi, “Performance analysis of the IEEE 802.11 distributed coordination function,” J. Select. Areas Commun., vol. 18, no. 3, pp. 535–547, Mar. 2000. H. Al-Ammal, L. A. Goldberg, and P. MacKenzie, “Binary exponential backoff is stable for high arrival rates,” in Proc. 17th Int. Symp. Theoretical Aspects of Computer Science, Lille, France, Feb. 2000 K. Sakakibara, H. Muta, and Y. Yuba, “The effect of limiting the number Of retransmission trials on the stability of slotted ALOHA systems,” IEEE Trans. eh. Technol., vol. 49, no. 4, pp. 1449–1453, Jul. 2000. IEEE, “IEEE Standard for Information Technology – Telecommunications and Information Exchange between Systems – Specific Requirements – Part 11: Wireless LAN MAC and PHY Specifications,” IEEE Std 802.11-1999, IEEE, New York, 1999. P802.11, IEEE Standard for Wireless Lan Medium Access Control (MAC) and Physical Layer (PHY) Specifications, Nov. 1997. T. S. Ho and K. C. Chen, “Performance Analysis of IEEE 802.11 CSMA/CA Medium Access Control Protocol,” Proceedings of IEEE PIMRC ’96, pp. 407-411, 1996. J. Håstad, T. Leighton, and B. Rogoff, “Analysis of backoff protocols for multiple access channels,” SIAM J. Comput., vol. 25, no. 4, pp. 740– 744,1996. D. G. Jeong and W. S. Jeon, “Performance of an exponential backoff scheme for slotted-ALOHA protocol in local wireless environment,” IEEE Trans. Veh. Technol., vol. 44, no. 3, pp. 470–479, Aug. 1995. K. K. Ramakrishnan and H. Yang, “The ethernet capture effect: analysis and solution,” in Proc. 19th Conf. Local Computer Networks, 1994, pp.228–240. J. Goodman, A. G. Greenberg, N. Madras, and P. March, “Stability of binary exponential backoff,” J. ACM, vol. 35, no. 3, pp. 579–602, 1988. D. R. Boggs, J. C. Mogul, and C. A. Kent, “Measured capacity of an ethernet: myths and reality,” in Proc. ACMSymp. Commun. Architecture and Protocols (SIGCOMM 88), 1988, pp. 222–234.

Clock Gating Aware Low Power Global Reset ALU and Implementation on 28nm FPGA Bishwajeet Pandey, Jyotsana Yadav, Jagdish Kumar

Ravikant Kumar

Atal Bihari Vajpayee - Indian Institute of Information Technology and Management, Gwalior Gwalior, India [email protected], [email protected]

Department of Computer Science University of Hyderabad Hyderabad, India [email protected]

Abstract— In this paper, we apply clock gating technique in Global Reset ALU design on 28nm Artix7 FPGA to save dynamic and clock power both. This technique is simulated in Xilinx14.3 tool and implemented on 28nm Artix7 XC7A200T FFG1156 -1 FPGA. When clock gating technique is not applied clock power contributes 32.25%, 4.24%, 3.06%, 3.09%, and 3.09% of overall dynamic power on 100 MHz, 1 GHz, 10 GHz, 100GHz and1 THz device frequency respectively. When clock gating technique is applied clock power contributes 0%, 1.02%, 1.06%, 1.06%, and 1.06% of overall dynamic power on 100 MHz, 1 GHz, 10 GHz, and 100GHz and1 THz device frequency respectively. With clock gating, there is 100%, 76.92%, 66.30%, 66.55% and 66.58% reduction in clock power in compare to clock power consumption without clock gate on 100 MHz, 1 GHz, 10 GHz, 100 GHz and 1 THz respectively operating frequency. Clock gating is more effective on 28nm in compare to 40nm and 90nm technology file.

used to hold enable signal, latch based clock gated module comes into picture as shown in Figure.1.

Figure 1: Latch Based Clock Gated Module

KEYWORDS- CLOCK GATING, GLOBAL RESET ALU, DYNAMIC POWER REDUCTION, DEVICE OPERATING FREQUENCY, LOW POWER, AND FPGA

I.

.

Introduction

Global reset based design means a design which has not user defined reset. Reset is a signal which forces our design to roll back to initial condition. Another aspect is 28nm technology. The 28nm technology delivers double the gate density in compare to 40nm process and also features a half of BRAM cell size. This translates to 50% more performance with 28nm Artix-7 in compare to 40nm based Virtex-6 FPGA. Artix-7 has already intelligent clock gating techniques which suppress unnecessary clock switching. We are applying clock gating technique along with the available intelligent clock gating technique in Artix-7 in target design of ALU. These power saving techniques reduce the power dissipation of ALU in significant amount. Clock gate is a logic gate which takes clock in form of input and produce gated clock in form of output [7].

ClockGate  Clock  Gate(either AND or OR) When Enable is 0 then it prohibit supply of clock and turn off the device and reduce power dissipation too. This is default latch free based clock gate implementation. When latch is

There are two types of latch based clock gated module. First one, is used when clock is low and other is used when clock is high. In latch based, when clock is positive edge triggered, OR gate is used and AND gate is used when clock is negative edge triggered. A. Statements of the Problem ALU is an integral part of processor and processor is brain of any computerized system. In order to develop any computerized target design of more star rating. We have to make all component of that target design as energy efficient. Clock gating plays an important role to design energy efficient component of target computerized system. Because, clock gating is effective in ALU and other component of target computerized system. II.

Related Work

Reference [1] proposed theoretical 15/16=93.75% clock power reduction in ALU using clock gating techniques. On simulator, [1] achieved 88.23% clock power reduction using latch based clock gating and 70.58% clock power reduction using latch free clock gating in [1]. In [2], latch free clock gating techniques is applied in ALU to reduce clock power and dynamic power consumption of ALU. In normal case,

Clock power is 50%, 41.46%, 51.30%, 55.15% and 55.78% of total dynamic power when device operating frequency is 100MHz, 1GHz, 10GHz, 100GHz and 1 THz. After implementation of clock gating techniques in ALU, Clock power reduces to 17.85%, 23.39%, 26.49% and 27.19% of total dynamic power, when device operating frequency is 1GHz, 10GHz, 100GHz and 1 THz. On 1 THz operating frequency, with clock gating, there are 72.77% reduction in clock power, 38.88% reduction in IOs power and 44% reduction in dynamic power in compare to power consumption without using clock gating techniques in [2]. Target device is 90-nm Spartan-3 in [2]. There is 14.57% reduction in junction temperature on 10GHz operating frequency in compare to temperature without using clock gating techniques in [2]. According to reference [2], Clock gating saves power but increases over all area. There is 32.35%, 37.84%, 43.31% and 44% reduction in dynamic current when we use clock gate on 1GHz, 10GHz, 100GHz and 1THz operating frequency respectively in [2]. We are extending all these work in [2] on 28nm Artix-7 FPGA. All of latest clock gating techniques like latch free, latch based and flip-flop based only disable clocks on valid clock gating conditions, like idle states or observability don't cares (ODC), whose applying will not change the circuit functionality. In [3], a power saving technique that shut down clocks during invalid cycles is explored, which create erroneous results when applied. In paper [4], Latch-free based design; Latch-based design and Flip-flop based design are many clock gating styles available to optimize power in VLSI circuits. According to [4], paper raise issues in implementation of clock gating design techniques. The clock gate (i.e., AND or OR) must not alter the waveform of the clock other than switching the clock on or off is first issue. Clock gating holds time violations and set-up time violations can be fixed like other violations during physical design phase is other issue in clock gating. Reference [4] proposes some techniques which can used to fix hold violations are clock skewing/buffering in data path near to endpoint. The main motive of [4] deals with Glitchestransient fault, which occur due to design error. Clock gating is implemented on smaller circuit called D flip-flop and on larger circuit called 16-bit register in [6]. The percentage of reduction in dynamic power especially clock power is verified for different device operating frequency in [6]. Reference [6] achieved 87.09%, 88.02%, 88.02%, and 88.01% clock power reduction in this work when clock period is 1ns, 0.1ns, 0.01ns and 0.001ns respectively in [6]. Design and implementation result of [6] shows that there is reduction in dynamic power especially significant reduction in clock power. Reference [6] also achieved 15%, 14.22%, 14.58%, 14.57% and 14.57% dynamic power reduction when clock period is 10ns, 1ns, 0.1ns, 0.01ns, and 01ps respectively. The clock gating components are part of the clock tree distribution components during clock tree synthesis (CTS) process in [10]. Therefore, the on chip location of the clock gating module have a significant impact on the overall clock tree power consumption is discussed in [10]. According to reference

[10], the clock gate enable signal timing convergence is also affected during performance verification (PV). III. GLOBAL RESET ALU WITHOUT CLOCK GATE A. Top Level Schematic of 8 Bit ALU without Clock Gate There are four inputs in global reset based ALU. These are A, B, Sel and clock. There is five outputs in global reset based ALU. These are: out, c_flag, p_flag, s_flag and z_flag. Input Sel (3-0) determines which operation is performed. Carry flag is c_flag. Zero flag is z_flag. Sign flag is s_flag. Parity flag is p_flag. Carry flag is affected by the left shift function, addition and subtraction.

Figure 2: Top Level Schematic of ALU

With 4 bit opcode, we can map 16(24) Top level schematic is independent of technology. It is same when technology is either 28nm or 40nm or 90nm. B. RTL Schematic of 8 bit ALU

Figure 3: RTL Schematic of ALU (a)

In synthesis phase of Xilinx Synthesis Technology (XST), RTL schematic generated, which store in native generic register (NGR) file. NGR file is dependent on the technology used. It is smaller with 28nm technology in compare to corresponding RTL technology schematic of 90nm

technology. In Figure.3-4, three 9-bit adders and one 9- bit subtractors is shown.

voltage. F7 is a function generator of 7-lookup table. CY is a carry chain. D. Report Utilization

Figure 4: RTL Schematic of ALU (b)

There are one 9-bit register, one 9 –bit 16-to -1 multiplexer, one 1-bit xor 8 and one 9-bit XOR in RTL schematic when implementation is made on 28nm Artix-7 FPGA. C. Technology Schematic of ALU

Figure 6: Report Utilization on 28nm Artix-7 FPGA

After HDL Synthesis, Report file generated, which shows the resource utilization by the target design on Artix7 FPGA. In 28nm Artix-7 FPGA, 269200 register are available, 134600 LUT are available, and 32 global clock buffers are available and 500 input output buffer also available. E. Power Consumption of 8 Bit ALU Table 1: Power Consumption on 28nm Artix7 FPGA Clocks Logic Signals Ios Dynamic

Figure 5: Technology Schematic of ALU

Native Generic Circuit (NGC) file describes technology schematic of ALU. In figure 5, out of 139 basic elements, one is ground (GND), eighteen is LUT2, six are LUT3, nineteen are LUT4, nine are LUT 5, twenty nine are LUT6, sixteen are MUXCY, fifteen are MUXF7, seven are MUXF8, one is Voltage Supply (VCC), seventeen are XORCY, 9 are flip flops, one is clock buffer, thirty two are IO buffers in which twenty are input buffer and twelve are output buffer. Technology Schematic is dependent on the FPGA used. This schematic is different on 90nm Spartan-3 FPGA and 40nm Virtex-6 FPGA due to mapping in placement changes with change in technology. Here, BELS is a basic element. LUTn is n-input Look Up Table. MUX is multiplexer available on FPGA. XOR is an exclusive logic gate. VCC is a supply

100 MHz

0.001

0.000

0.001

0.028

0.031

1 GHz

0.013

0.003

0.008

0.282

0.306

10 GHz

0.092

0.016

0.064

2.825

2.997

100 GHz

0.921

0.053

0.554

28.249

29.778

1 THz

9.214

0.399

5.442

282.490

297.545

When clock gating technique is not applied clock power contributes 32.25%, 4.24%, 3.06%, 3.09%, and 3.09% of overall dynamic power on 100 MHz, 1 GHz, 10 GHz, 100GHz and1 THz device frequency respectively as shown in Table.1.

IV. Global Reset Alu With Clock Gate A. Top level schematic of Clock Gated ALU A, B, Sel, clock and en are five inputs as shown in Figure.7.

Figure 9: RTL Schematic of Clock Gated ALU (b)

Figure 7: Clock Gated ALU Schematic

Out, c_flag, s_flag, z_flag and p_flag are five outputs in this ALU as shown in Figure.7. Input Sel (3-0) is first bit of operation code of microprocessor. Zero (Z), Carry (C), Sign (S), and Parity (P) are generated and status of flag saves in flag register of microprocessor. B. RTL Schematic Of Clock Gated ALU

There are one 9-bit register, one 9 –bit 16-to -1 multiplexer, one 1-bit xor8 (eight input xor) and one 9-bit xor2 (two input xor) as shown in Figure8-9. C. Technology Schematic of Clock Gated ALU Native Generic Circuit (NGC) file describes technology schematic of ALU. In figure 10, there are one thirty nine Basic Elements (BELS). It depends on the technology of FPGA used because mapping in placement is proportional to feature size of technology used in target design. Here, 28nm Artix-7 FPGA is used.

Figure 10: Technology Schematic of Clock Gated ALU Figure 8: RTL Schematic of Clock Gated ALU (a)

Native Generic Register (NGR) file is generated by XST. NGR file stores net-list of register transfer level schematic of clock gated ALU. Three 9-bit adders and 9- bit subtractors are shown in Figure.8-9.

Out of 139 BELS, one is ground, 19 are 2-input LUT, six are 3-input LUT, 19 are 4-input LUT, 9 are 5-input LUT, 29 are 6-input LUT, sixteen are Multiplexer with Carry, fifteen are multiplexer for creating a function of 7 lookup tables, seven are multiplexer for creating a function of 7-lookup tables, one is Voltage Supply, seventeen are XOR in Carry Chain, nine are flip-flops, thirty four are IO buffers in which 22 are input buffer and 12 are output buffer in technology schematic of clock gated ALU as shown in Figure: 10. D. Report Utilization

GHz, 10 GHz, 100 GHz and 1 THz respectively operating frequency as shown in Table:3.

Figure 11: On-Chip Resource Utilization of Clock Gated ALU

This clock gated ALU use 9 register out of 269200 register available in Artix-7. It also use 82 LUT out of 134600 LUT available in this 28nm technology based Artix7 FPGA. Clock Gated ALU design during mapping uses 34 IOs block out of 500 on chip IOs block available as shown in Figure: 11. This On-Chip Resource Utilization report is generated in Xilinx Planahead 14.4.

Figure 12: Power (Y-axis) and Frequency (X-axis)

When latch based clock gating technique is applied in place of latch free clock gating technique. We get more reduction in compare to latch free clock gating techniques. Operating Frequency 100 MHz 1 GHz 10 GHz 100 GHz 1 THz

E. Power Consumption of 8 Bit ALU Table 2: Power Dissipation in 8 bit ALU Clocks Logic Signals Ios

Dynamic

100 MHz

0.000

0.000

0.000

0.028

0.029

1 GHz 10 GHz 100 GHz 1 THz

0.003 0.031 0.308 3.079

0.003 0.018 0.070 0.567

0.005 0.041 0.351 3.446

0.281 2.812 28.123 281.233

0.292 2.903 28.853 288.325

When clock gating technique is applied clock power contributes 0%, 1.02%, 1.06%, 1.06%, and 1.06% of overall dynamic power on 100 MHz, 1 GHz, 10 GHz, and 100GHz and1 THz device frequency respectively. V. CONCLUSION

Table 4: Clock Power Affected by Clock Gate Clock power Clock power with Latch without clock gate Based Clock Gate 0.001 0.000 0.013 0.002 0.092 0.025 0.921 0.250 9.214 2.504

With latch based clock gating, there is 33.33%, 20%, 18.83% and 18.67% further reduction in clock power in compare to clock power consumption with latch-free based clock gate on 1 GHz, 10 GHz, 100 GHz and 1 THz respectively operating frequency as shown in Table:4.

VI. FUTURE SCOPE The future generation of 16nm and 7nm technology based FPGA is next target design for the future work for further reduction in power with technology scaling. On available technology, there is open scope to design 64-bit ALU or even bigger ALU. The technique of clock gating and global reset can be applied to any VLSI circuit for low power design.

A. Clock Power Analysis in ALU ACKNOWLEDGEMENT Operating Frequency 100 MHz 1 GHz 10 GHz 100 GHz 1 THz

Table 3: Clock Power Affected by Clock Gate Clock power Clock power with Latch Free without clock gate Clock Gate 0.001 0.000 0.013 0.003 0.092 0.031 0.921 0.308 9.214 3.079

We are grateful to ABV-IIITM director Prof. S.G Deshmukh for his motivation for research oriented works. REFERENCES [1]

[2]

With latch free clock gating, there is 100%, 76.92%, 66.30%, 66.55% and 66.58% reduction in clock power in compare to clock power consumption without clock gate on 100 MHz, 1

[3]

B. Pandey and M. Pattanaik, “Clock Gating Aware Low Power ALU Design and Implementation on FPGA”, 2nd International Conference on Network and Computer Science (ICNCS), Singapore, April 1-2, 2013 B. Pandey, J. Yadav, N. Rajoria, M. Pattanaik, “Clock Gating Based Energy Efficient ALU Design and Implementation on 90nm FPGA”, International Conference on Energy Efficient Technologies for Sustainability-(ICEETs), Nagercoil, Tamilnadu, April 10-12, 2013 T. Lam, X. Yang, W. Tang, Y. Wu, "On applying erroneous clock gating conditions to further cut down power," Design Automation

[4]

[5]

[6]

[7]

Conference (ASP-DAC), 2011 16th Asia and South Pacific , vol., no., pp.509-514, 25-28 Jan. 2011. J. Shinde, S. Salankar, "Clock gating — A power optimizing technique for VLSI circuits," Annual IEEE India Conference (INDICON), pp.1-4, 16-18 Dec. 2011 E. Arbel, C. Eisner, O. Rokhlenko, "Resurrecting infeasible clockgating functions," 46th ACM/IEEE Design Automation Conference, pp.160-165, 26-31 July 2009 M. Dev, D. Baghel, M. Pattanaik, A. Shukla, “Clock Gated Low Power Sequential Circuit Design”, IEEE Conference on Information and Communication Technologies(ICT), 11-12 April, 2013 J. Monteiro, J. Rinderknecht, S. Devadas and A. Ghosh, "Optimization of combinational and sequential logic circuits for low power using precomputation," Sixteenth Conference on Advanced Research in VLSI, pp.430-444, 27-29 Mar 1995

[8]

P. Babighian, L. Benini and E. Macii, "A scalable algorithm for RTL insertion of gated clocks based on ODCs computation," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, , vol.24, no.1, pp. 29- 42, Jan. 2005 [9] S. K. Teng, and N. Soin, "Regional clock gate splitting algorithm for clock tree synthesis," Semiconductor Electronics (ICSE), 2010 IEEE International Conference on , pp.131-134, 28-30 June 2010 [10] B. Pandey and M. Pattanaik, “Low Power VLSI Circuit Design with Efficient HDL Coding”, International Conference on Communication Systems and Network Technologies (CSNT), Gwalior, India, April 5-8 2013

An Efficient Approach to Setup High Performance Network Center in Academia Jagdish Kumar, Mahua Bhattacharya, Bishwajeet Pandey National Knowledge Network Lab, ABV-Indian Institute of Information Technology and Management Gwalior, India [email protected], [email protected] Abstract—Performance is a critical issue in setup of network center in any academia. In a research institute, a moment of time is precious for analysis, design, research and development. In this work, an efficient approach, which based on firewall, complaint handling system, LDAP server and spanning tree protocol, is used to set up a network center in ABV-IIITM to provide network facility to researcher, student and faculty in costeffective and fast-processing way. Here, the Spanning-Tree Protocol is applied in switched networks for a loop free network which open a path for fault tolerant network. That increases 6070 percent performance in compare to traditional approach used in setup of Network Center. Complaint handling system is a unique characteristic of this network, which ensure transparency and make this network more users friendly. Keywords—High Performance, Network Center, Academia Network, Fibre Optical Cable Layout, VLAN, Spanning Tree Protocol

For that, three fiber junction boxes are in use in VLAN. In order to increase performance following measure has been taken:  Implement Firewall to get better access and speed in the Network exists.  Implement Complaint Handling System, to automate the Complaint Handling & Management.  Implementing LDAP Server for authentication to use various Network Facilities e.g. - Internet Access, Network Resources Access etc. In Section 1, VLAN of IIITM is introduced. In Section 2, we study the related works. In section 3, we describe our approach that results in high performance network center. In section 4, we conclude our works and in final section 5, we explore the future scope of our current works. II.

I.

INTRODUCTION

The VLAN (Virtual Local Area Network) of ABV-IIITM covers three floors in all academic blocks (A, B, C, D, E) and hostel (BH1, BH2, BH3, GH).

Figure 1: VLAN of IIITM

RELATED WORKS

According to reference [1], the interconnection network performance is a key factor to restrict the overall performance in high performance computers. Due to the inherent limitations of the electrical interconnection network, optical interconnection has become a feasible way to improve the performance in [1]. In view of BOIN network, a ThreeDimensional Optical Interconnection Network (TDOIN) is proposed in [1]. According to Reference [2], Virtualization technology is used in cloud computing to support heterogeneous and dynamic workload. The role of scheduler in a virtual machine monitor (VMM) for allocating resources is becoming important. However, scheduler treats both I/Ointensive and CPU-intensive applications as same applications. Therefore, we are unable to take full advantage of high performance networks such as 10-Gigabit Ethernet. In [2], guest domains consist of I/O-intensive domains and CPUintensive domains. I/O-intensive domains are able to obtain extra credits that CPU-intensive domains are willing to share. The work in [3] covers packet’s medium access delay for a given number of node N. Binary exponential is a special case of exponential backoff when r =2, where r is a backoff factor and we basically analyses the effect of backoff factor r =1.1,1.2,1.3, 1.4,1.5,1.6,1.7,1.8,1.9,2.0 on exponential backoff algorithm. All the results of [3] are obtained through MATLAB (MATrix LABoratory) simulation language. The VLAN in [4] has two leased line. In IIITM, we are achieving

same result with only one leased line by shifting only paradigm in implementation and setup of network center. We also studied different VLAN of IIT, NIT, and IIIT but due to copyright issue we are not producing their schematic in this paper. But, the strategy behind their setup in network center helps us to enhance performance of our VLAN.

B. Network Setup Inside IIITM The Network Center require following 11 server: Mail Server – Squirrel Mail, Zimbra  Web Server – Apache  Library Management Server – Alice  DHCP Server  Antivirus Server  Proxy Server  NPTEL Server  Web-Content NPTEL Server  MATLAB S/w installation Server  Samba Server  SM Server These eleven servers and their interdependence with different server are shown in the schematic of Network Center as shown in Figure: 4.

Figure 2: VLAN of NIT Jamshedpur [Ref4]

III.

PROPOSED METHODOLOGY

A. Performace Enhancement Techniques The Spanning-Tree Protocol is applied in switched networks for a loop free network which open a path for fault tolerant network.

Figure 4: Schematic of Network Center

C. Overview of IIITM VLAN The academic block of IIITM has three floors. The Boys and Girls Hostel has also three floors. The Network Center is connected to all these three floors and wire should reach to all these floors as shown in Figure: 5.

Figure 3: Spanning Tree Protocol

It provides a loop-free redundant network topology by placing certain ports in the blocking state. That, results in less propagation and access time, eventually increases in significant performance.

Figure 5: Floor Wise VLAN Setup in IIITM

D. Assigning Access Ports to a VLAN

transmission of voice, video, data, and other network. The ISDN schematic of IIITM is shown in Figure: 7.

Table 1: Assigning Access Ports to a VLAN

Switch(config)#interface gigabitethernet 1/1  Enters interface configuration mode Switch(config-if)#switchport mode access • Configures the interface as an access port Switch(config-if)#switchport access vlan 3 • Assigns the access port to a VLAN

G. Verifying the VLAN Configuration Switch# show vlan [id | name] [vlan_num | vlan_name]

The command shown in bold text in Table: 1 is used to open configuration mode, configure the interface as an access port and to assign the access port to a VLAN. E. Network Address Translation in IIITM

H. Complaint Handling System

Figure 6 : Network Address Translation in VLAN

To avoid network conflict, an algorithm is design and implemented for network address translation in VLAN of IIITM. This algorithm translates local IP address to global IP address and vice versa as shown in Figure: 6. F. ISDN Setup in IIITM

Complaint Handling System (CHS) is Developed by Network Center for the users to login, post a complaint, and check status of posted complaint. It is being in testing phase Network Center has allotted Login Id's to:- Hostel Representative . Representative can login to Complaint Handling System, and post complaint related to,  Network  Wi-Fi  Machine Related problem H/w or S/w  AMC's (Annual Maintenance Contract), etc. CHS also analyses users’ problem in detail Network Center lists Frequently Asked Questions (FAQ) to access network without any problem. Q. How to use Internet on computer "or" on Wi-Fi devices? Q. What are the total Proxies? Q. How to change Proxy in the Web Browser?

Figure 7: ISDN Setup in IIITM

Integrated Services Digital Network (ISDN) is a set of communication standards for simultaneous digital

CHS of Network Center has following forms: Registration Form for Mail Id creation.  Form for Password Reset of Mail Id or LDAP Id.  Form for Device or Equipment request from Network Center.  Form to give Complaint in written format.  Registration Form for LDAP Id & Password creation.

IV.

CONCLUSION

Our approach to design efficient and high performance based on spanning tree protocol provides a loop-free redundant network topology by placing certain ports in the blocking state. The Network Center has total of 11 servers. Along with these servers, Complaint Handling System (CHS) is developed by Network Center for the users to login, post a complaint, and check status of posted complaint. It makes Network center more user friendly and also ensure transparency. To avoid network conflict, an algorithm is design and implemented for network address translation in VLAN of IIITM. V.

FUTURE SCOPE

Currently, our network center operated on Gbps speeds. There is wide scope to operate this internet with speed of Tbps. The National Knowledge Network lab is doing research in this field to explore possibility to design a network center that will operate in Tbps speed. After exploring each and every option available in 3G, we are shifting our focus to 4G in order to provide high performance to our end users i.e. faculty, researcher and staff. REFERENCES P. Chao, S. Jiahui, L. Baoliang, W. Junhui, D. Wenhua, “TDOIN: A Novel Three-Dimensional Optical Interconnection Network for High Performance Computer”, IEEE 12th International Conference on Computer and Information Technology (CIT), pp. 869-876, 2012 2. Z. Chang, J. Li, R. Ma, Z. Huang, H. Guan, “Adjustable Credit Scheduling for High Performance Network Virtualization”, IEEE International Conference on Cluster Computing (CLUSTER), pp.337345, 2012 3. Deepa Singh, Bishwajeet Pandey and Geetam S Tomar, "Performance Evaluation Of Backoff Method -Effect Of Backoff Factor On Exponential Backoff Algorithm", IEEE International Conference on Computational Intelligence and Communication Networks (CICN), Mathura, 27-29 September, 2013 (Communicated) 4. VLAN Design Specification of NIT Jamshedpur 5. Nah-Oak Song, Byung-Jae Kwak, and Leonard E. Miller,”On the Stability of Exponential Backoff” Journal of Research of the National Institute of Standards and Technology, Volume 108, pp- 289-297, Number 4, July-August 2003 6. K. Sakakibara, T. Seto, D. Yoshimura, and J. Yamakita, “On the stability of slotted ALOHA systems with exponential backoff and retransmission cutoff in slow-frequency-hopping channels,” in Proc. 4th Int. Symp. Wireless Personal Multimedia Communications, Aalborg, Denmark, Sep. 2001. 7. “An improved stability bound for binary exponential backoff,” Theory Comput. Syst., vol. 30, pp. 229–244, 2001. 8. W. Yue and Y. Matsumoto, “An Exact Analysis for CSMA/CA Protocol in Integrated Voice/Data Wireless LANs”, in Proc. IEEE Globecom ’00,December 2000. 9. F. Cali, M. Conti, and E. Gregori, “IEEE 802.11 Protocol: Design and Performance Evaluation of an Adaptive Backoff Mechanism,” IEEE Journal on Selected Areas in Communications, vol. 18, no.9, pp. 17741786, September 2000. 10. G. Bianchi, “Performance analysis of the IEEE 802.11 distributed coordination function,” J. Select. Areas Commun., vol. 18, no. 3, pp. 535–547, Mar. 2000. 1.

11. H. Al-Ammal, L. A. Goldberg, and P. MacKenzie, “Binary exponential backoff is stable for high arrival rates,” in Proc. 17th Int. Symp. Theoretical Aspects of Computer Science, Lille, France, Feb. 2000 12. K. Sakakibara, H. Muta, and Y. Yuba, “The effect of limiting the number Of retransmission trials on the stability of slotted ALOHA systems,” IEEE Trans. eh. Technol., vol. 49, no. 4, pp. 1449–1453, Jul. 2000.

Efficient Data Structure Based Smart Card Implementation Shalini Jain , Anupam Shukla Indian Institute of Information Technology, Gwalior Morena Link Road, Gwalior, India [email protected]

Abstract—to make smart card much faster, we need efficient data structure. Access time of on chip data depends on how and where we stored. Some Data Structure take maximum time and some take minimum time depending on the space and time complexity of data structure. In this work, we have taken some data structures and find that BST is the best suitable data structure for performing smart card operations in compare to other possible data structures. Keywords—Data Structure, Smart Card, RAM, Memory, File Management, Master File, Dedicated File, Elementary File, Personal Identification Number(PIN), ROM, Flash I.

INTRODUCTION

In real life, we deal with different types of smart card. Subscriber identification module (SIM) cards is the smart card which we use almost maximum time. From the usage point of view, in real life, debit card, atm card and credit card is the card used after the sim card. Third usage is commuter cards (like smart card in Delhi metro), and radio frequency identification (RFID) cards used in office and institute. Smart cards are small in size so it is easy to carry.They have enough processing power and sufficient data storage capabilities to store user profiles, to carry out cryptographic functions, and to support electronic commerce or other type of applications. Data is organized in smart card in the form of files. File organization for data structure of smart card is of types, one is Dedicated file (DF) and another is Elementary file (EF). Then, these organized file can be referenced either by file identifier or path or short EF identifier (SFID) or DF name. EF again classified into two types, one is transparent elementary file and record elementary file. Data referencing methods are also described. After detailed analysis of different data structures in term of the time taken by insertion, deletion and search operation, we have seen the BST is best for search operation and also optimal to insertion and deletion operation. In the next section, we have discussed the related work done in the field of data structure of smart card. In section III, we discussed the available data structures and file organization of smart card and also focussed on the referencing methods. In section IV, we search for the most efficient data structure for

Bishwajeet Pandey, Mayank Kumar Embedded System Design R&D Lab Centre for Development of Advanced Computing Noida, India

insertion, deletion and search operations. Then we conclude our work in section V. II.

RELATED WORK

The state of the art of unified read-write of the smart card of different data formats is presented in [1]. Based on the framework designing and system organizational structures of “card type determining-> function calling ->Data Conversion >unified read-write”, a data reading, writing and reception data management of the smart card of different data formats through PC is implemented in [1]. The experimental results on smart cards of water meter, such as the RF card, TM card and IC card show that the general read-write system is feasible, and the read-write efficiency is not affected, possessing good operating conditions and scalability in [1].System like Smart card is a portable media which store sensible data. The information protection is possible with personal identification number (PIN), or through finger-print or retina based biometrics. Algorithms and datastructures are developed in [2] to solve security problem. There is possibility of data damage in absence of anti-IFD-switch off function, a dual-data memory structure with check sum is used in [3], which guarantees the correctness and availability of the data in the card when a data disaster possible. Reference[4]suggests an alternative to improve the speed of the execution by using the improved data store mechanism and memory structure. [4]Stored Java Objects stored area in EEPROM into RAM toachieve high performance Java Card and is similar with memory management in Java System of PC environment. According to [5] data or applications in the form of documents which in the smart card are stored storage medium such as FLASH or EEPROM, COS through the smart card file system management and organization to achieve data and application storage and management. The file system of Smart Card in [5] includes three types of files which is master file MF (Master File), dedicated file DF (Dedicated File), and basic file EF (Elementary File). MF and DF known as catalog files, EF as the data file. MF is the entrance of the file cards. Each card has one and only one MF, also known as root, all other subfolders (grandson) of MF files. In addition to MF, all the files that contain subfolders are considered as DF. In the file tree, if the file node itself is a leaf node, which has no child nodes, then this file is known as EF. EF is the basic carrier of

data in the card, which useful for applications that require data in the EF. Reference [5] classified according to the data structure, EF also includes a transparent binary file (Transparent EF), the linear Fixed-length record file (Linear fixed EF), circular recording document (Cyclic EF) and linear variable length record files (BER-TLV EF). According to [6], Smart card deployment is increasing pressure for vendor regarding security features and improvements in computing power to support encrypt-decrypt algorithms with bigger footprints in the smart card chips in the last decade. Typical applications described in [6] include subscriber identification module (SIM) cards (in telecommunications), micropayments (in financial transactions), commuter cards (in urban transportation systems), and identification (ID) cards. Although the share of cards used for identification applications (which we'll call smart ID cards) is relatively small within the overall smart card market, it's one of the fastest growing segments. Smart ID cards is used for physical access to secure facilities and logical access to IT systems (Web servers, database servers, and workstations) which is discussed in [6]. III.

SMART CARD DATA STRUCTURE AND REFERENCING METHODS

A data structure contains information on the logical structure of data as seen at the interface, when processing entersindustry commands for interchange. 1.

File organization

Current organization of smart card supports following two types of files, i.e. dedicated file (DF) and elementary file (EF). Dedicated file (DF) - In a smart card data can be organized logically in dedicated files in the structural hierarchy. The root dedicated file is always known as the master file (MF). And MF is mandatory. Dedicated files other thanMF are optional. Elementary file (EF) - The smart card can have following two types of EFs, One is Internal EF and other is Working DF. Those EFs which are designed for storing data processed by the card, i.e. the card management and control purposes analyzes and uses data are known as Internal EF. And, Working EF are those EFs that are intended for storing data not interpreted by the card, i.e. data to be used by the outside world exclusively. 2.

File referencing methods

Files, which are not implicitly selected, can be selected by one of the following: 

Referenced by file identifier – Any file could be referenced by a 2 byte length file identifier.







3.

Tounambiguously point a file, EF’s and DF’s, just below the given DF, should have different identifiers. Some of the reserved file identifiers are: 3F00 (Master file identifier), FFFF (for future use), 3FFF (refer referenced by path). Referenced by path – Path of a file (concatenation of file identifiers, starting from MF/current DF, till the file’s identifier itself) contains consecutive parent DFs. 3FFF points current DF and can be used if it’s identifier is unknown. Path of a file uniquely selects it from its MF or the current DF. Referenced by short EF identifier – Short EF identifier is a referencing method to uniquely identify a file. It consists of a 5 bit value which can hold values from 1 to 30. The SFI value ‘0’ refers to the currently selected EF. IT cannot be used as a file identifier or in a path. Referenced by DF name – Any DF may have a unique DF name of length 1 to 16 bytes, which could be used to uniquely identify a file. DF name shall be unique within a card, to unambiguously select a DF by its name. Elementary file structures

Two types of elementary file structures are defined. The first is: at interface EF is interpreted as a sequence of data units i.e. transparent structure EF. The second is: at interface EF is interpreted as a sequence of individually identifiable records i.e. record structure EF. Attributes for EF’s structured records are: Record size (fixed or variable), Organization of records (sequenced-linear or ring-cyclic). One of the following four methods which should be supported by the smart cards for structuring EF’s are Transparent EF, Linear EF with fixed sized records, linear file with variable size records and Cyclic EF with fixed sized records. 4. Data referencing methods Data from any smart card can be referenced either in the form of records or data objects or data units. Within anrecord structure EF, data is stored in a single continuous sequence of records. While within an transparent structure EF, data is stored in a single continuous sequence of data units. We can not reference a record or a data unit which is not present within the EF or which is out of scope of an EF, will give an error. Referencing of data from EF, numbering of records within an EF and size of data unitsall are dependent on EF. And different EFs may contain different type of these information or structure. Each smart card have ATR (Answer –TO-Reset) which provides information related to card. For referencing a record or an data unit, related EF must be selected (for performing operation on that EF) before selecting particular data unit or record. For which any of the file referencing methods can be used. After that particular record

or data unit can be referenced either by record numberor by setting some parameter bytes.  4.1Record referencing Either with record identifier or with the record number, each and every record of any selected EF can be referenced. It is 1 byte field with unsigned integer values ranging from '01' to 'FE'. So, any file can have maximum 254 number of records. '00' is reserved for special purposes. 'FF' is Reserved ForUsed.Forreferencing by record identifier need to manage a record pointer.Card reset command, CREATE FILE command, SELECT FILE command and any other commands carrying a valid file identifier or short EF identifier or path to a specific file can affect the record pointer. Referencing by record number will not affect the record pointer. Referencing by record identifier– Application provides record identifier for each record within an EF. If in a data field of a record is a SIMPLE-TLV data object, then the record identifier will be the first byte of the data object. Within anrecord structure EF, records may have the same record identifier, in that case data contained in the records may be used for discriminating between them. When a reference is given with respect to record identifier, an indication will also be there for specifying the logical position of the record that may be either the first occurrence or the last occurrence or the next occurrence or the previous occurrence with respect to the record pointer: 



In each linear structure EF, all the records will be assigned the logical position in sequential manner. When insert or write an record within EF, the first record will be assigned in first logical position than next record in next logical position and so on. The records will be arranged in the order of creation of records within EF. In each cyclic structure EF, all the records will be sequentially assigned the logical positions in the opposite order. When insert or write an record within EF, it will be always assigned in the first logical position. That’s why we will find first record at the last logical position and recently inserted record at the first position always.

These are some additional rules defined for linear structure elementary files and cyclic structure elementary files: 



The very first occurrence of an EFwill be the record present in the first logical position and with the specified identifier. Similarly the last occurrence will be the record present in the last position and with the specified identifier. If there is no current record within an EF, than first occurrence will be considered as a next occurrence.



Also last occurrence will be considered as a previous occurrence. If there is a current record within an EF, than nearest record from specified identifier with the greater logical position than the current record, will be the next occurrence. Similarly nearest record from specified identifier with smaller logical position than the current record, will be the previous occurrence. The value '00' does not depend on the record identifier. It will refer to the first, last, next and previous record in the numbering sequence.

Referencing by record number–All the records within an record structure EF will be unique and in sequentialform: 



In each linear structure EF, all the records will be assigned at the logical position in sequential manner. When insert or write an record within an EF, it will be assigned logical position in the order of creation. Hence first created record is the first record. In each cyclic structure EF, all the records will be sequentially assigned the logical positions in the opposite order. When insert or write an record within an EF,the record will be sequentially assigned in the opposite order. Hence the most recently created record is the first record.

4.2 Data unit referencing In Transparent Structure EF, data units could be referred by an offset as in command READ BINARY. It is limited to 8 or 15 bits, unsigned integer, as per the option in the command. First data unit of the EF offset value is 0. For every consecutive data unit offset is incremented by 1. Default data unit size is one byte, if not defined in the command APDU given by the card. A record structure EF may support data unit referencing and in the case when it supports data unit referencing, data units may contain some structural information along with data, like record numbers may be contained by linear structure EF. Within arecord structure EF, since storage order of the records in the EF is unknown, so data unit referencing may not provide the intended result. IV.

EFFICIENT DATA STRUCTURES

Different data structure takes different time for insertion or deletion or search operation. In order to search efficient data structure to store data for handling with smart card operating system, we take array, linked list, doubly linked list, stack, queue, binary search tree, hash and Heap under consideration. Table 1: Time Complexity of Data Structures for Insert Operation Data Structur es

Arr ay

Link ed List

Insert Doub Stac ly k Link ed

Que ue

BST

Has h

Hea p

List Time Complex ity

O(1)

O(1), at the head

O(1), at the head

O(n), insert after requir e eleme nt.

O(n), at position by travesei ng the linkedli st

O(n),at position by travesei ng the linkedli st

O(1)

O(1)

O(log n)

O(1)

O(log n)

To access the data stored in smart card, we perform operation called search. In search operation, each and every data structure except BST takes O(n) time. While BST take less time to insert record in smart card i.e. O(logn).

In insertion operation, each and every data structure except BST and Heap takes almost same time. Stack, Hash and Queue will take less time to insert record in smart card. While BST and Heap takes O (logn) time. Then, we analyze the time required for deletion of record. Table 2: Time Complexity of Data Structures for Delete Operation

Data Structur es

Arr ay

Time Complex ity

O(1), deletio n by positio n O(n), delete eleme nt

Link ed List

Delete Doub Stac ly k Link ed List

O(1), at the head,

O(1), at the head,

O(n), deletion at position by travesei ng the linkedli st

O(n),at the position by travesei ng the linkedli st

O(1)

Que ue

BST

Has h

Link ed List

Time Complex ity

O(n)

O(n)

O(log n), if binary search is perform ed and the array

Search Doub Stac ly k Link ed List O(n)

O(n)

Insertion of record and deletion of record is secondary concern for any smart card user. Smart card usage time is directly proportional to the time of searching of stored data in smart card. We have seen that BST is taking less time to search records stored in smart card and also it is good for insertion and deletion.

V. O(1)

O(log n)

O(1)

O(log n)

Table 3:Time Complexity of Data Structures for Search Operation

Arra y

The time taken by Smart card in performing operation is depend on the time taken by searching of records not on the time taken by inserting and deleting record.

Hea p

In deletion operation, each and every data structure except BST and Heap takes almost same time. Stack, Hash and Queue will take less time to insert record in smart card. While BST and Heap takes O (logn) time. Then, we analyze the time required for search of record.

Data Structur es

isalread y sorted.

Que ue

BS T

Has h

Hea p

O(n)

O(log n)

O(n)

O(n)

CONCLUSION

Searching of record is primary concern for the time analysis of any smart card. Whereas, insertion of record and deletion of record is secondary concern for any smart card user. In traditional smart card application, Inserting and deleting record is done on the time of manufacturing by card developer. While, user uses search operation mostly. Here, we have seen the BST is best for search operation and also optimal to insertion and deletion operation.

REFERENCES [1]. Rang-ding Wang , Wei Wang ,“Design and implementation of the general read-write system of the smart card”, 3rd IEEE International Conference on Ubi-media Computing (U-Media), 2010. [2]. R Sanchez Reillo, “Securing information and operations in a smart card through biometrics”, Proceedings of IEEE 34th Annual International Carnahan Conference on Security Technology, 2000. [3]. Ming-Sheng Liu; Hui Liu; Yin-Hua Ma; Wen-Xiong Li, “Research on precautions against data disaster of logic security smart card”, Proceedings of International Conference on Machine Learning and Cybernetics, 2004. [4]. Yoon-Sim Yang; Won-Ho Choi; Min-Sik Jin; Cheul-Jun Hwang; Min-Soo Jung, “An Advanced Java Card System Architecture for Smart Card Based on Large RAM Memory”, International Conference on Hybrid Information Technology, 2006. [5]. Chen Yuqiang; Hu Xuanzi ; Guo Jianlan ; Liu Liang, “Design and implementation of Smart Card COS”, International Conference on Computer Application and System Modeling (ICCASM), 22-24 October 2010. [6]. Chandramouli, R.; Lee, P., “Infrastructure Standards for Smart ID Card Deployment”, IEEE Security & Privacy, Volume:5, Issue:2, pp. 92-96, 2007

Power Reduction of ITC’99-b01 Benchmark Circuit Using Clock Gating Technique Veer Pratap Singh

Vijayshri Chaurasia

Jyotsana Yadav

Bishwajeet Pandey

Department of ECE MANIT Bhopal, India [email protected]

Department of ECE MANIT Bhopal, India [email protected]

Department of IT AB-IIITM, Gwalior India [email protected]

Department of IT ABV- IIITM, Gwalior, India [email protected]

Abstract—In VLSI technology power dissipation is a limiting quantity and it should be reduced as low as possible. This paper represents the reduction of clock power and dynamic power consumption in ITC’99 b01 benchmark circuit using latch free clock gating technique. This technique also reduces IOs power in fewer amounts when synthesized on 40-nm vertex-6. At operating frequency of 1 THz, the proposed design results 97.08% reduction in clock power, 7.28% reduction in IOs power and 44% reduction in dynamic power as compared to ITC’99 b01 benchmark circuit without latch free clock gating technique. Keywords— Benchmark b01, Clock Gate, Finite State Machine, Operating Frequency, Latch free clock gate, Clock Power, Dynamic Power.

I.

details and performance of ITC’99- B01 benchmark with latch free clock gate in section III. In section IV effect of inclusion of clock gating circuit is analyzed. Finally conclusions are made in section V. II.

ITC’99- B01 BENCHMARK WITHOUT CLOCK GATE

This circuit is a standard circuit which is used for testing different hardware’s those work on analog and mixed signals. Main function of this circuit is comparing serial flow with the help of finite state machine [1].

INTRODUCTION

Clock gating is a technique in which we control input clock by an enable signal. This technique is much effective in reduction of power that dissipates due to input clock and logic switching i.e. dynamic power. Benchmark circuits are a set of circuits whose characteristics are typical of synthesized circuits [1]. Many benchmark circuits exist for structural and sequential circuits [6], which are used for testing analog and mixed signal. We are using b01 benchmark circuit which is ITC'99 benchmark developed in the Computer Aided Design (CAD) Group. Original functionality of b01 benchmark circuit is Finite State Machine (FSM) that compares serial flows [1]. This circuit has four inputs each of single bit and two outputs, it is used for checking different analog and mixed signals and this circuit consumes less power because of presence of less number of gates in it [1]. B. Pandey and M. Pattanaik in [2], gave three clock gating techniques; Latch Free Clock Gating, Latch based Clock Gating and Flip Flop based Clock Gating. These clock gating techniques have found their application in sequential circuits and FPGA [3 - 5]. In proposed technique latch free clock gating technique is used for reduction of dynamic power consumption in b01 benchmark circuit. In this clock gating technique a new input signal ‘enable’ is applied that controls the input clock signal to a sequential circuit. In this technique one extra AND gate is added in RTL design of benchmark circuit [1]. Rest of the paper contains the explanation of ITC’99- b01 benchmark without clock gate section II. Implementation

Figure1: Top Level Schematic of ITC’99-B01 Benchmark without clock gate

Figure 1 gives the top level schematic of ITC’99-b01 benchmark. In this schematic 4 inputs and 2 outputs are present. As benchmark b01 is a sequential circuit so "clock input" is present for controlling different flip flops present in this circuit, "line1 & line2" are two single bit inputs whose serial flow is to be compared, "reset" is also a single bit input when this input attains logic level 1 then both outputs set to logic level 0, output "outp" is XOR result of inputs line1 & line2 and output "overflw" is single bit and attains logic level according to the state of finite state machine [1]. The RTL schematic of ITC’99-B01 Benchmark is given in Figure 2. It has 110 lines, 49 gates, 2 primary inputs, 2 primary outputs and 5 flip flops at gate level as per standard of ITC’99. After synthesizing b01 benchmark model in Xilinx 14.2 version we get this RTL schematic, this model consist Finite state machine so according to advance HDL synthesis report, in this RTL schematic five registers, five flip-flops, one XOR2 gate and one FSM having eight different states are present.

Figure2: RTL Schematic of ITC’99-B01 Benchmark without clock gate Table 1: Power Consumption in Watt of ITC’99- B01 Benchmark without Clock Gate Frequency Clocks Logic Signals IOs Dynamic (GHz) (Watt) (Watt) (Watt) (Watt) (Watt) 0.003 0.000 0.000 0.005 0.008 0.1 1

0.032

0.001

0.001

0.051

0.085

10

0.038

0.002

0.011

0.510

0.851

100

3.307

0.009

0.113

5.105

8.533

1000

33.069

0.077

1.130

51.04

85.322

Dynamic power is the summation of clock power, logic power, signal power and IOs power but clock power & IOs power have large contribution in dynamic power compare to other powers, As we can analyze from Table1 that clock power is contributing 37.50%, 37.64%, 4.46%, 38.75% and 38.76% in total dynamic power consumption when device is operating at frequency 0.1GHz, 1GHz, 10GHz, 100GHz and 1000GHz respectively. In same way IOs power contribution in total dynamic power consumption is 62.50%, 60.00%, 59.92%, 59.80% and 59.82% when device operates at frequency 0.1GHz, 1GHz, 10GHz, 100GHz and 1000 GHz respectively. III.

ITC’99- B01 BENCHMARK WITH LATCH FREE CLOCK GATE

The schematic of ITC’99-B01 Benchmark with latch free clock gate is approximately similar to the schematic without

clock gate except one input that is "en" i.e. enable. It is a 1 bit input signal which controls input clock to the device by switching at different logic level. On behalf of that this schematic has 5 inputs and 2 outputs, the functionality of other inputs and outputs is similar to schematic shown in Figure1.

S Figure3: Top Level Schematic of ITC’99-B01 Benchmark with latch free clock gate

The RTL schematic of ITC’99-B01 Benchmark with latch free clock gate is shown in Figure 4. It is relatively different from RTL schematic of b01 benchmark circuit without clock gate. It contains an extra AND gate which is encircled in Figure 4. This AND gate has "en" and "clock" as inputs and a signal "clkgat" at output. Here the clock is converted to a new clock for circuit i.e. "clkgat". Other components in this RTL schematic are same as in Figure 2. After synthesizing “b01 benchmark with latch free clock gate model in Xilinx 14.2 we

. Figure4: RTL Schematic of ITC’99-B01 Benchmark with latch free clock gate

get this RTL schematic. This model consist Finite state machine so according to advance HDL synthesis report, in this RTL schematic five registers, five flip-flops, one XOR2 gate and one FSM having eight different states are present.

95.18% and 95.19% when device operates at frequency 0.1GHz, 1GHz, 10GHz, 100GHz and 1000GHz respectively.

Table 2: Power Consumption in Watt of ITC’99- B01 Benchmark with latch free Clock Gate

Power consumption is directly proportional to operating frequency of device so more frequency more power consumption. In this section we are comparing clock power and IO power values of original benchmark b01 circuit without clock gating technique at 10GHz & 100GHz device operating frequency because major part of dynamic power is consist of clock power and IO power.

Frequency (GHz) 0.1 1

Clocks (Watt) 0.000 0.001

Logic (Watt) 0.000 0.001

Signals (Watt) 0.000 0.001

IOs (Watt) 0.005 0.048

Dynamic (Watt) 0.005 0.051

10 100 1000

0.010 0.097 0.965

0.004 0.030 0.290

0.012 0.116 1.164

0.478 4.784 47.836

0.503 5.026 50.254

Dynamic power is the summation of clock power, logic power, signal power and IOs power but clock power & IOs power have large contribution in dynamic power compare to other powers, As we can see from Table 2 that clock power is contributing 1.94%, 1.96%, 1.98% ,1.93% and 1.92% in total dynamic power consumption when device is operating at frequency 0.1GHz, 1GHz, 10GHz, 100GHz and 1000GHz respectively. In same way IOs power contribution in total dynamic power consumption are 99.98%, 94.11%, 95.02%,

IV.

EFFECT OF CLOCK GATING

Table 3: Power with and w/o Clock Gating techniques on 10 GHz Design

Clock

IOs

Dynamic

Without Clock Gate

38mW

510mW

851mW

With Latch Free Clock Gate

10mW

478mW

503mW

It is clear from Table 3 that when our device operates at 10 GHz frequency, inclusion of latch free clock gating results power reduction of 73.68%, 6.27% and 40.89% in clock power, IOs power and Dynamic power is respectively. The amount of reduction is very clear in bar chat of Figure 5.

V.

Figure 5: Power with and w/o Clock Gating techniques on 10 GHz

All type of powers increases with increase in operating frequency for both the designs but the rate of rise is made lesser with the addition of latch free clock gate. Table gives the power dissipation of ITC’99-b01 benchmark circuit with and without latch free clock gate on operating frequency 100GHz.

This paper presents application of Latch free clock gating technique in bench mark b01 circuit technique and synthesized on 40-nm Vertex 6 FPGA using Xilinx 14.1 software. In case of dynamic power reduction, latch free clock gating technique reduced dynamic power by more than 40%. Without implementation of clock gating technique clock power contribution in total dynamic power was 4.46% and 38.75% at 10GHz and 100GHz device operating frequency respectively. But after implementation of clock gating technique clock power clock power contribution in total dynamic reduced by 73.68% and 97.06% at 10 GHz and 100 GHz frequency respectively. So latch free clock gating technique is very effective in reduction of clock power. Further the performance of latch free clock gating technique may investigate with other benchmark circuits. References [1] [2]

Table 4: Power with and w/o Clock Gating techniques on 100 GHz Design

Clock (mW)

IOs (mW)

Dynamic (mW)

Without Clock Gate

3307

5105

8533

With Latch Free Clock Gate

97

4784

5026

When our device operate at 100 GHz frequency it reduced the clock power by 97.06%, IOs power by 6.29% and Dynamic power by 41.09%. Same analysis is also shown in Figure 6 in the form of bar chart.

[3]

[4]

[5]

[6] [7]

[8]

Figure 6: Power with and w/o Clock Gating techniques on 100 GHz

CONCLUSION

Dirk Stroobandt, Peter Verplaetse, Jan Van Campenhout “Towards synthetic benckmark circuits for evaluating timing driven-CAD tools” Bishwajeet Pandey and Manisha Pattanaik, “Clock Gating Aware Low Power ALU Design and Implementation on FPGA”, 2nd International Conference on Network and Computer Science (ICNCS), Singapore, April 1-2, 2013 Mahendra Pratap Dev, Deepak Baghel, Bishwajeet Pandey, Manisha Pattanaik, Anupam Shukla, “Clock Gated Low Power Sequential Circuit Design”, IEEE Conference on Information and Communication Technologies(ICT), 11-12 April, 2013 Bishwajeet Pandey, Jyotsana Yadav, Nitish Rajoria, Manisha Pattanaik, “Clock Gating Based Energy Efficient ALU Design and Implementation on FPGA”, International Conference on Energy Efficient Technologies for Sustainability-(ICEETs), Nagercoil, Tamilnadu, April 10-12, 2013. Bishwajeet Pandey and Manisha Pattanaik, “Clock Gating Aware Low Power ALU Design and Implementation on FPGA”, 2nd International Conference on Network and Computer Science (ICNCS), Singapore, April 1-2, 2013 http://www.cad.polito.it/downloads/tools/itc99.html Jagrit Katuria, M.Ayobkhan, Arti noor “A review of clock gating techniques” MIT international journal of electronics and communication 2 Aug,2011 N. Subhramanyam, Ambavaram Poli Reddy, J. Rajpraveen “Fault Detection for ISCAS 89’ S-27 Benchmark Circuit UsingLow Power LtRTPG” International Journal of Engineering Research & Technology (IJERT) Vol. 2 Issue 1, January- 2013.

IO Standard Based Green Multiplexer Design and Implementation on 40nm FPGA Bishwajeet Pandey, Rajendra Aaseri, Deepa Singh

Sweety

Indian Institute of Information Technology, Gwalior Gwalior, India

Maharaja Surajmal Institute Delhi, India [email protected]

[email protected] Abstract— In this work, we are using Stub Series Transistor Logic (SSTL) on the simplest VLSI circuit multiplexer and analyze the power dissipation with different class. Using SSTL15 in place of SSTL2_II_DCI, there is reduction of 304mW power i.e. 76.19% power reduction. Using HSTL_I_12 in place of HSTL_III_DCI_18, there is reduction of 157mW power i.e. 62.3% power reduction. HSTL and SSTL are IO standards taken under consideration. SSTL minimum power consumption is almost same as HSTL. But, the power dissipation of SSTL is 58.73% higher than HSTL, when we consider maximum power dissipation of both. Virtex-6 is an FPGA on which we implement this low power design. Xilinx ISE 14.1 is an ISE tool to design and synthesize multiplexer.

In this work, we are considering the effect of SSTL and HSTL on VLSI circuit design. 4x1 Mux is taken as basic VLSI circuit for the analysis of effect of SSTL and HSTL both in IOs power consumption. On-chip impedance matching network is used in HSTL controlled impedance I/O pads that take care of process, voltage, and temperature variations [1].

Keywords- HSTL, SSTL, FPGA, IOs Power, IO Standard, Leakage Power, RTL Schematic, Technology Schematic, Synthesis, Implementation, Netlist, Low Power

I.

INTRODUCTION

Virtex-6 FPGA supports different IO standard like HSTL, SSTL, LVDCI, LVCMOS and combination of two or three of these IO standards. High-speed transceiver logic or HSTL is a technology independent JEDEC standard 16.3 for signaling between high performances integrated circuits. Stub Series Terminated Logic (SSTL) is an electrical standards for driving transmission lines commonly used with DRAM based DDR memory IC's and memory modules.

Figure 1: Elaborated Design of Low Power 4x1 Mux

Figure 2: Top Level Schematic of MUX

II.

LITERATURE REVIEW

Transmission line reflections are one of the limiting factors in high speed I/O performance [1]. The impedance of integrated circuit output pad drivers must be equal to the impedance of the transmission lines in order to control reflections and power dissipation [1]. In the other words, these reflections can be controlled by matching the driver output impedance to impedance of the transmission line [1]. Off-chip components are used to match impedance of termination networks [1]. Parallel termination eliminates transmission line reflections. However, parallel termination increases power dissipation because DC component is added to power consumption [1]. Source series termination approach is energy efficient approach [1]. In a point-to-point environment, series termination is used to absorb incident waves and effectively damping reflections in the transmission line. In [2], symmetric parallel termination, termination resistor at the load is connected to half of output buffers supply voltage. In [2], double parallel termination, parallel termination resistors are connected at begin and ends of the transmission line. [3] Achieves 35.9% dynamic power reduction and 36.11%

dynamic current reduction by shifting drive strength from 24mA to 2mA on LVCMOS25 when 2.5 V is output driver supply voltage and 1.0V is input supply voltage. [3] Also achieve 30% dynamic power reduction and 21.7% dynamic current reduction by shifting drive strength from 24mA to 2mA on LVCMOS12 when 1.2V is output driver supply voltage. In [4], a bus design scheme is discussed in order to achieve impedance matching and uniform distribution of power. In contrast to conventional schemes, the scheme in [4] lets the line impedance of each segment of the bus changes, and the impedance-matching resistance values are determined accordingly for optimization. The derivation of formulas for optimal line impedance and matching resistances is in [4]. The master driver and branch receivers’ voltage and power ratio is discussed in [4]. These ratios depend only on the ratio of master::branch impedance and branch count. Similar expression are also established in [4] for the backward direction. Reference [5] provides analysis of power and performance analysis of Look up Table (LUT) with use of circuit technique. To achieve optimal power delay relationship for low power application, proper resizing of sleep transistors are done in [5] in the LUT. From migrating from HSTL_II_DCI_18 to HSTL_I in [8], we are saving 16.34 % dynamic power but increasing 3.9% PAR time, 6.52% Initial timing analysis time, 3.96% mapping time, 20.48% placer time and 20.68% NGDBuild time. So, [8] use HSTL_III in place of HSTL_II_DCI_18 to reduce dynamic power as well achieve high speed. III.

B. Technology Scheamtic of 4x1-MUX

Figure 4: Technology Schematic of Low Power MUX

Technology schematic is stored in native generic circuit(NGC) file, to show how it will implement on target FPGA device. In this technology schematic, number of LUT6 is 1.

RESULTS

This design uses 7 IO port and 7 Nets on RTL stage. A. RTL Schematic of 4x1-MUX

Figure 3: RTL Schematic of Low Power MUX Figure 5: Synthesized Schematic of Low Power MUX Design

RTL schematic is stored in native generic register(NGR) file, that show what is our logic not exacltly how it will implement on target device.

This netlist has 8 instances, 7 I/O ports and 14 nets. Along with LUT6, there is 6 input buffers and 1 output buffer.

C. Resource Estimation of 4x1 MUX

E. IOs Power Consumption of 4x1-MUX

Resource Estimation varies on different stages of design life cycle. On every stage of Synthesis, Netlist, and Implementation, the utilization of resource increases. Synthesis Estimation

Netlist Estimation

Implemented Estimation

Power Consumption of Different SSTL IOs Power Leakage Power SSTL15 95mW 1.294W SSTL18_I 96mW 1.294W SSTL18_II 97mW 1.294W SSTL2_I 98mW 1.295W SSTL2_II 100mW 1.295W SSTL15_DCI 163mW 1.295W SSTL18_I_DCI 193mW 1.296W SSTL18_II_DCI 216mW 1.297W SSTL2_I_DCI 286mW 1.299W SSTL2_II_DCI 399mW 1.300W

In different class of SSTL, SSTL15 is taking the minimum IOs power and SSTL2_II_DCI is taking the maximum IOs power. Using SSTL15 in place of SSTL2_II_DCI, there is reduction of 304mW power i.e. 76.19% power reduction. F. Uniform IOs Power Consumption in 4x1-MUX

On synthesis stage, there is no LUT. During synthesis, net list creates; net list has LUT but hasn’t sliced. Slice is used after implementation phase of hardware design life cycle. After implementation, there is a decrease in on chip IOs resource. D. IOs Power Consumption of 4x1-MUX Power Consumption of Different HSTL (=100mW) IOs Power Leakage Power HSTL_II_18 101mW 1.294W HSTL_I_DCI 164mW 1.295W HSTL-III-DCI 171mW 1.296W HSTL-II-DCI 179mW 1.296W HSTL_I_DCI_18 195mW 1.296W HSTL_II_DCI_18 219mW 1.297W HSTL_III_DCI_18 252mW 1.298W

HSTL HSTL_I_12 HSTL_I HSTL_I_18 HSTL_III_18

SSTL SSTL15 SSTL18 SSTL18_II SSTL2_II

Power 95mW 96mW 97mW 100mW

The power consumption of HSTL_I_12 and SSTL15 is almost same that is 95mW. The power dissipation of HSTL_I and SSTL18 is same in this implemented design. G. Comparison of HSTL and SSTL IOs Power Consumption Maximum Minimum

HSTL 195mW 95mW

SSTL 399mW 95mW

In the different HSTL class, there is 195mW maximum power consumption. In the different SSTL class, there is 399mW maximum power consumption. In HSTL and SSTL both, there is 95mW minimum power consumption. H. Uniform Leakage Power Consumption in 4x1-MUX Leakage Power 1.294W

HSTL

SSTL

HSTL_I_12 HSTL-I HSTL_I_18 HSTL-II HSTL-III HSTL-III-18 HSTL_II_18

SSTL15 SSTL18_I SSTL18_II

HSTL_I_DCI

SSTL2_I SSTL2_II SSTL15_DCI

1.295W

1.296W

1.297W

HSTL-III-DCI HSTL-II-DCI HSTL_I_DCI_18 HSTL_II_DCI_18

SSTL18_I_DCI

[4]

SSTL18_II_DCI [5]

There are 7 sub type of HSTL and 3 sub type of SSTL, which have same leakage power in magnitude of 1.294W. There are one sub type of HSTL and three sub type of SSTL, which also has same power dissipation in magnitude of 1.295W. There are three sub types of HSTL and one sub type of SSTL, which also has same power dissipation in magnitude of 1.296W. The class II_DCI of both SSTL and HSTL has same power dissipation. IV.

CONCLUSION

There are 13 different HSTL and 10 different SSTL taken in order to search for the most energy efficient IO standard for low power design. Using SSTL15 in place of SSTL2_II_DCI, there is reduction of 304mW power i.e. 76.19% power reduction. Using HSTL_I_12 in place of HSTL_III_DCI_18, there is reduction of 157mW power i.e. 62.3% power reduction. HSTL and SSTL are IO standards taken under consideration. SSTL minimum power consumption is almost same as HSTL i.e. 95mW in both cases. But, the power dissipation of SSTL is higher than HSTL, when we consider maximum power dissipation of both. The highest power consumption of HSTL is 252mW and the highest power consumption of SSTL is 399mW. V.

FUTURE SCOPE

Here, the basic circuit MUX is taken for low power analysis but there is scope to take more complex circuit for low power analysis. We have searched the most energy efficient SSTL and HSTL IO standard that can be applied on any circuit to achieve low power design. This low power design is implemented on 40-nm Virtex-6 FPGA. There is scope to re implement this design on 28-nm Virtex-7 or any other latest FPGA for more energy efficient design.

Acknowledgment Author would like to thanks Prof. S.G. Deshmukh, Director ABV-IIITM for his research motivation and supports. Thanks and appreciation to the helpful people at ABV-IIITM, for their support.

References [1] [2] [3]

Gerald L. Esch, Robert B. Manley,, The Hewlett-Packard Journal, Article 5, August 1998. 7 Series FPGAs SelectIO Resources, www.xilinx.com/support/.../user_guides/ug471_7Series_SelectIO.pdf Bishwajeet Pandey, Mayank Kumar, Nirmal Robert, Manisha Pattanaik, “Drive Strengh and LVCMOS Based Dynamic Power Reduction of ALU on FPGA”, International Conference on Information Technology

[6]

[7]

[8]

and Science (ICITS 2013), Bali, Indonesia, March 16-17, 2013(Accepted) Yohwan Yoon; Deog-Kyoon Jeong, “A Multidrop Bus Design Scheme With Resistor-Based Impedance Matching on Nonuniform Impedance Lines” , IEEE Transactions on Circuits and Systems I: Regular Papers, Vol.58, Issue.6, 2011. D. Kumar, P. Kumar, M. Pattanaik, ”Performance analysis of 90nm Look Up Table(LUT) for Low Power Applications”, 13th Euromicro Conference On Digital System Design Architectures, Methods and Tools , Lille, France, 1-3 September, 2010. S. Ortega-Cisneros; J.J. Raygoza-Panduro; J. Suardiaz Muro; E. Boemo, ”Rapid prototyping of a self-timed ALU with FPGAs” International Conference on Reconfigurable Computing and FPGAs,pp. 26-33, 2012 S. Birla, N. K. Shukla, K. Rathi, R. K. Singh, M. Pattanaik, ”Analysis of 8T SRAM Cell at Various Process Corners at 65nm Process Technology”, Circuit& Systems, USA, Vol. 2, No. 4, pp. 326-329, Oct. 2011. Bishwajeet Pandey and Manisha Pattanaik, “High Speed Transistor Logic Based Dynamic Power Reduction of RAM on FPGA”, 2nd Student Conference on Engineering and Systems (SCES), MNIT Allahabad, April 12-14, 2013

Suggest Documents