A Method to Protect Bloom Filters from Soft Errors - IEEE Xplore

1 downloads 0 Views 189KB Size Report
implementations of Bloom filters are radiation induced soft errors. Soft errors can corrupt the contents of a Bloom filter causing false positives and false negatives ...
A Method to Protect Bloom Filters from Soft Errors Pedro Reviriego, Salvatore Pontarelli, Juan Antonio Maestro and Marco Ottavi in a high speed memory. Electronic systems suffer a number of reliability issues including for example manufacturing defects and soft errors [7]. Soft errors have a transient effect and cause no permanent damage to the circuit functionality. However, they can alter the contents of a register or memory bit causing an incorrect system behavior. Memories are typically one of the circuit elements that are more vulnerable to soft errors. To mitigate the effects of soft errors, memories are commonly protected with Error Correction Codes (ECCs) [8]. In particular, a per word parity bit or a per word Single Error Correction (SEC) code are commonly used. The use of an ECC impacts memory area, power and delay. This is due to the additional parity check bits and the encoder and decoder circuitry. The requirements in terms of speed of networking applications are very stringent [9], the available embedded memory is limited and the expected soft error rate is not negligible. This makes the design of reliable and efficient Bloom filters a challenging task. Since the memory size is constrained, the memory increase due to the ECC effectively reduces the number of positions in the Bloom filter. This in turn increases the false positive rate. The use of the ECCs also impacts the access time as encoders/decoders are needed to write/read words. Since the embedded memories are only used to store the contents of the Bloom filter, specific protection techniques tailored to the nature of the Bloom filter can be used to reduce the protection cost. The protection of Bloom filters so that they can cope with errors has been studied in [10] where the authors focus on faults on the computation of the hash functions and errors on the contents of the BF are not considered. Other studies, rather than protecting the BF propose its use to identify defects in nano-memories [11] or in Content Addressable Memories [12]. Finally the use of BFs to identify defects in nanomemories considering also errors in the BF implementation has been studied in [13]. Recently, Biff codes that are based on BFs have been proposed to correct errors in large data sets [14]. Also in [15] Bloom filters are used to correct errors in the data set associated to the filter. However, to the best of the authors´ knowledge no technique has been presented to optimize the protection of the contents of the BF. In this paper, a scheme to efficiently protect the contents of the BF is presented. The proposed approach is based on the observation that bit errors that change a ‘0’ to a ‘1’ can cause false positives while errors that change a ‘1’ to a ‘0’ can cause false negatives. A small percentage of false positives is inherent to the BF nature while no false negative can occur in

Abstract— Bloom filters are used in many computing and networking applications where they provide a simple method to test if an element is present in a set. In some of those systems, reliability is a major concern and therefore the Bloom filters should be protected to ensure that errors do not affect the system behavior. One of the most common type of errors in electronic implementations of Bloom filters are radiation induced soft errors. Soft errors can corrupt the contents of a Bloom filter causing false positives and false negatives. Error Correction Codes (ECCs) can be used to protect the Bloom filter so that for example single bit errors are detected and corrected. However, the use of ECCs impacts the implementation area, power and delay. In this paper, a method to efficiently protect the contents of a Bloom filter is presented. The scheme exploits the different effects at the system level of false positives and false negatives to achieve effective error protection at lower cost than that of a traditional ECC. To illustrate the benefits of the proposed method, a case study is presented where the proposed implementation is compared with the use of a traditional Hamming ECC. Index Terms— Bloom filters, Error correction, Soft errors.

I. INTRODUCTION

I

N many computing and networking applications, there is a need to check if a given element belongs to a set. An efficient way to perform this check is to use Bloom filters (BF) [1]. BFs are used for example in Google Bigtable [2] and have been proposed to improve server performance [3]; they are also used in many networking applications [4]. BFs are commonly implemented in electronic circuits to achieve high-speed performance [5],[6]. In those implementations, the contents of the BF are commonly stored

This paper is part of a collaboration in the framework of COST ICT Action 1103 “Manufacturable and Dependable Multicore Architectures at Nanoscale”. P. Reviriego and J.A. Maestro are with Universidad Antonio de Nebrija, C/ Pirineos, 55 E-28040 Madrid, Spain (phone: +34-914521100; fax: +34914521110; email: {previrie, jmaestro}@nebrija.es). S. Pontarelli is with Consorzio Nazionale Interuniversitario per le Telecomunicazioni, CNIT, Italy. (phone +39-0672597369; email: [email protected]) M. Ottavi is with University of Rome “Tor Vergata”, Via del Politecnico 1 00133 Rome, Italy (phone +39-0672597344; email: [email protected]). Copyright © 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected].

c 978-1-4799-8606-4/15/$31.00 2015 IEEE

80

an error free BF. Therefore, soft errors that cause false positives have a much lower impact than those that cause false negatives. This can be exploited to optimize the BF protection. In particular, in this paper it is shown that using a simple per word parity bit, a protection level equivalent to that provided by an SEC code can be achieved. This reduces circuit area, power and delay and enables more efficient fault tolerant implementations. The rest of the paper is organized as follows. In section II a short overview of BFs is provided. The proposed scheme is described in section III. Then in section IV a case study is presented to evaluate the benefits of the proposed approach in a practical scenario. Finally, the conclusions of this work are summarized in section V. Fig. 1.Diagram of a Bloom Filter.

II. BLOOM FILTERS A Bloom Filter (BF) is a data structure in which zeros and ones are stored in an array of m bits that is accessed using k hash functions. Each of the hash functions h1, h2, ... , hk is used to map a given element x to one of the m bits. Figure 1 shows a diagram of a Bloom filter. The operations defined in a Bloom filter are: - Insertion: an element x is inserted in the BF by setting to one the bits in the array that correspond to the positions h1(x), h2(x), ... , hk(x). - Query: to check if an element x is in the BF, the bits in the array with positions h1(x), h2(x), ... , hk(x) are read and when all of them are one, the element is considered to be in the BF. These operations ensure that when an element has been inserted into the BF, it will be found when a query for it is done. Therefore an error free BF will not produce false negatives. However, a BF can produce false positives. This is because when a query for an element that has not been added to the BF is done, the k positions checked can take a value of one due to the insertions of other elements in the BF. The probability of a false positive depends on the number (n) of entries added to the BF. For example, when the hash functions are uniformly distributed, after adding n elements to the BF, a given bit in the array is zero with a probability p0(n) that can be approximated as:

1· § p o ( n ) ≅ ¨1 − ¸ m¹ ©

kn

≅e



k ⋅n m

.

III. PROPOSED METHOD As discussed before, a small probability of false positives is in the nature of BFs. Therefore, errors that cause false positives can be tolerated as long as they do not increase the false positive probability significantly. For example, for a BF with m = 64K, k = 6 and in which half the bits are one, the false positive probability will be approximately 1.6%. If a soft error flips a zero bit to one in the BF array, the probability of bits being one increases very slightly from 0.5 to 0.500015 with a negligible impact on the false positive rate. On the other hand, errors that change a bit from one to zero can cause false negatives changing the BF behavior qualitatively. For example, in a security application where a BF is used to identify possible threats that require further processing, a false negative can compromise the system. The key idea explored in this paper is that the asymmetric impact of errors on the BF can be exploited to optimize the protection of BFs. In particular, two alternative protections are considered for the memory: a per word parity bit and a per word SEC code. The schemes are illustrated in Figures 2 and 3. The mappings of the m bit array of the BF to the memory words is also shown in the figures. Both parity and SEC are commonly used in memories as mentioned before, the first one for error detection and the second one for error correction. Our goal is to exploit the properties of BF to provide a protection equivalent to that of SEC when the memory is only protected with a parity bit.

(1)

As all k positions indexed by the k hash functions have to be one, the probability of a false positive can be approximated as:

p fp ( n ) ≅ (1 − p o ( n ) )

k

k ⋅n − · § ≅ ¨¨1 − e m ¸¸ ¹ ©

k

(2)

The m bit array of a BF is commonly stored in a memory. When fast operation is required a high speed memory is used. The memory is typically organized in words of w bits such that m/w words are needed to store the array. For a BF implementation small (8-16 bits) word sizes are preferred as they reduce power consumption.

Fig. 2. Storage of a Bloom Filter in a memory with word-size w protected with a parity bit.

2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)

81

Dividing by the total number of bits in the BF (m) the maximum soft error bit rate that will have an effect on the false positive rate smaller than 0.1% is obtained ೖ

ܾ݁‫ ݎ‬൑

Fig. 3. Storage of a Bloom Filter in a memory with word-size w protected with a SEC code.

When a parity bit is used to protect each memory word, errors that affect one bit will be detected when the parity is checked. However, there will be no indication if the error affected a bit that stored a one or a zero. Since the goal is to avoid false negatives, errors that change a one to a zero have to be corrected. This can be done by simple setting all the data bits in the word (the bits in one of the rows in Figure 2) to one. This ensures that if an error changed a one to a zero, it is corrected. However, this procedure will introduce a number of errors in which zero bits are changed to one. For a BF in which the probability of a bit being zero is p0, on average w ∗ po errors of that type will be introduced. In the following, we will argue that the proposed solution to avoid false negatives is very suitable for terrestrial applications where the soft error rate is small and the consequent impact of the proposed correction solution in terms of false positives can be considered negligible. The soft error rate per bit that can be tolerated with a negligible impact on the BF false positive rate can be estimated as follows. Let us first assume that variations in the false positive rate smaller than 0.1% from the error free case are negligible. Consider a BF to which a number of entries n have been added so that the probability that a bit in the BF is one is p1. The impact of additional errors in the false positive rate can be estimated starting from the error free rate which as discussed before can be approximated as ሺ‫݌‬ଵ ሻ௞ Ǥ Additional ones added as a result of the proposed protection method against soft errors will increase p1 to a value p1’. To have an effect smaller than 0.1%, p1’ has to be equal or smaller than ೖ

‫݌‬ଵ Ԣ ൑ ξͳǤͲͲͳ ‫݌ כ‬ଵ

(3) ೖ

which allows an increase from p1 to p1’ of ( ξͳǤͲͲͳ െ ͳ)‫݌ כ‬ଵ . Each soft error will on average result in w*(1-p1) bits being set to one and increase p1 by w*(1-p1)/m. Therefore the maximum number of errors e that can be tolerated is ೖ

݁ ൑

82

௠‫כ‬ሺ ξଵǤ଴଴ଵିଵሻ‫כ‬௣భ . ௪‫כ‬ሺଵି௣భ ሻ

(4)

ሺ ξଵǤ଴଴ଵିଵሻ‫כ‬௣భ . ௪‫כ‬ሺଵି௣భ ሻ

(5)

For example, if k = 8, p1= 0.1 and w = 16, the acceptable bit error rate would be 8.7*10-7. This is probably close to a worst case as low BF occupancy, large number of hashes and large word size have been selected. Typical bit error rates due to soft errors in memory for terrestrial applications are of the order of 1000 Failures in Time (FIT) per Mbit for SRAM at the 65nm technology node [16]. This is equivalent to a bit error rate of 8.8*10-9 per year which is two orders of magnitude below the acceptable error rate. This shows that the effect on the false positive rate of using the proposed protection scheme will be negligible in most terrestrial applications. When the soft error rates are much larger as for example in space applications, the impact on the false positive rate can be significant as errors accumulate over time. In those scenarios the technique cannot be used. However, BFs are mostly used in terrestrial applications. One interesting observation, that it is not analyzed further in this paper, is that the proposed approach that leverages the asymmetric impact of the errors in Bloom filters could be also used to improve the strength of a Single Error Correction Double Error Detection (SEC-DED) code in presence of double errors. When a double error is detected by the SECDED code, the proposed approach of setting all the bits in the word to one could be used to remove any possible false negatives at the expenses of a small increase in false positive rate. Finally, it must be noted that the analysis presented in this section assumes that the soft error probability is the same for a memory cell that stores a zero or a one. However, in a given design, the soft error probability may be different depending on the value stored on the memory cell. This effect can be taken into account with a more complex analytical modeling of the effect of errors on the false positive rate, but the expected differences are small and will not affect the validity of the analysis presented. When the impact on the false positive rate is negligible, the proposed scheme provides a protection equivalent to that of using a SEC code. The benefits are the reduced implementation cost. In particular, the memory size, the access time and the power consumption are reduced. The memory size is reduced because only a single parity bit per word is needed compared to several bits for a SEC code. The benefits for different word sizes are illustrated in Table I. It can be observed that the memory size is reduced significantly in all cases with larger reductions for smaller word sizes. This is interesting because small word sizes provide also reductions in power consumption as less bits have to be read to access a given position in the BF array. The reductions in the memory size enabled by the proposed technique will reduce the power consumption. The simplified encoder and decoder circuitry will also reduce the power consumption of each access to the memory. They also enable reductions in area and delay. This last reduction will have a direct impact on the access time to a

2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)

Word size (w) 8 16 32

TABLE I COMPARISON OF MEMORY SIZE SEC additional Proposed additional Memory size bits bits reduction 4 1 25% 5 1 19% 6 1 13%

IV. CASE STUDY To illustrate the benefits of the proposed scheme in a practical application, it has been implemented in a realistic case study: the use of BFs to perform per-flow traffic monitoring in a data network [17]. As the amount of traffic in high speed links is huge, BFs are used to identify if a packet belongs to a new or an existing flow. To this end, the IP addresses that need detailed monitoring are added to the BF so that packets that contain those addresses are detected and analyzed. In particular, IP version four traffic is considered and the inputs to the BF are the pair of 32 bit source and destination IP addresses. The elements to be added to the BF and the packets used to test the BF are taken from packet traces that are publicly available [18]. The experiments are such that first n IP pairs are randomly selected from the packet in the traces and inserted in the BF. Subsequently, 104 different pairs also taken randomly from the traces are used to evaluate the false positive ratio. The experiment is repeated 100 times and the averaged results are reported. The BF is configured with the following parameters k = 4, m = 64K and different values of n. The values of n are selected such that p1 is approximately 0.1, 0.2 and 0.4. It is also assumed that the BF array is stored in a memory with word size w = 16. The results in terms of false positive rate and implementation overheads are described in the following subsections. A. Impact on the false positive rate To assess the impact of the proposed schemed on the false positive rate, a variable number of soft errors has been introduced in some of the experiments and the proposed scheme has been used to set all the bits in the affected words to one. Subsequently the false positive rate is measured with 104 accesses. The process is repeated 100 times for each configuration and the averaged results are reported. Finally, the results are compared with those of the error free experiment. When an SEC code is used, all the errors are corrected. Therefore, the false positive rate for SEC protection will be the same as that of the error free case. The number of soft errors is varied from 1 to 10 which for a 64K memory and a typical soft error rate of 1000 FIT/Mbits would take 105 days and 106 days respectively to occur. The results are illustrated on Figure 4. It can be observed that the effect of the errors on the false positive rate is negligible. That is the rates are almost the same for the error free case and for the cases in which 1 to 10 errors are inserted. In fact, no consistent difference can be

observed as it is much smaller than the statistical variations observed in the experiments. Therefore in a practical implementation, the proposed technique will have no impact on the performance of the BF. Given the amount of time needed to accumulate 10 soft errors in a terrestrial application, it can be expected that the system is re-initialized well before that and errors are cleared. Therefore, for the case study a negligible impact on the false positive rate is clearly achieved. However, as discussed before the occurrence of even a single false negative can compromise the system performance and should be avoided by using either a SEC code or the proposed technique. 4 p1 = 0.4 3.5

p1 = 0.2 p1 = 0.1

3 False positive ratio in %

BF array position. Since the reductions are dependent on the specific circuit implementation, they will be illustrated in the next section in which a practical case study is presented.

2.5 2 1.5 1 0.5 0

0

1

2

3

4 5 6 Number of errors

7

8

9

10

Fig. 4. False positive rate as a function of the number of soft errors for the case study for several values of p1.

B. Implementation overheads As discussed in the previous section, the reduction on memory size can be easily computed. In the case study for a word size of 16 bits a SEC code requires five parity bits compared to one in the proposed scheme. Therefore a reduction of 4/21= 19% is achieved. The proposed scheme also reduces the area, power and delay of the encoder and decoder circuitry. To evaluate the reductions, the parity and SEC encoder and decoder circuits have been implemented in HDL and synthesized for a 45-nm library [19]. For the SEC code, a simple Hamming code has been used. The results are presented in Tables II and III. It can be observed, that the proposed scheme provides significant reductions in all cases except for the encoder delay. Since encoding is used only when adding entries to the BF, its delay is not critical for the BF performance. On the other hand decoding is needed for each query operation and its delay and power consumption are key for the BF performance. For the decoder area and power are reduced by over 70% and delay by over 30%. Therefore the proposed method enables a significant improvement in the overheads required to implement error protection in BFs. Finally, it is also important to note that the decoders are used in each of the k hashes required for a query operation. Therefore, the numbers in the table have to be multiplied by k to obtain the total values for the BF. The reductions can be put in perspective by comparing the absolute numbers with those required to implement a

2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)

83

commonly used hash function. More precisely the H3 hash function described in [20] has been implemented and the area, power and delay estimates are shown in Table IV. The SEC decoder represents an overhead over the hash function of approximately 55% in area and 36% in power. The delay overhead is even larger with a value of 109%. The proposed method reduces those overheads to 12.4%, 9.4% and 83% and therefore improves the performance of the BF in terms of power and delay significantly. In addition to these gains, the scheme also enables a reduction of the BF memory size as discussed before.

REFERENCES [1] [2]

[3]

[4]

[5] TABLE II COMPARISON OF ENCODER AREA, POWER AND DELAY SEC Proposed Reduction

Area (ȝm2) 267 139 47.9%

Power (ȝW) 0.23 0.12 47.8%

Delay (ns) 0.48 0.55 -14.6%

TABLE III COMPARISON OF DECODER AREA, POWER AND DELAY SEC Proposed Reduction

Area (ȝm2) 648 147 77.3%

Power (ȝW) 0.50 0.13 74.0%

Delay (ns) 0.86 0.58 32.6%

TABLE IV HASH FUNCTION AREA, POWER AND DELAY H3 hash

Area (ȝm2) 1184

Power (ȝW) 1.38

Delay (ns) 0.79

[6] [7]

[8]

[9] [10]

[11]

V. CONCLUSIONS AND FUTURE WORK In this paper, a technique to protect Bloom Filters has been presented. The proposed scheme exploits the limited impact of errors that change bits from zero to one in Bloom Filters to reduce the implementation cost. This is used to achieve a protection level similar to that provided by a Single Error Correction (SEC) but using a simple parity bit. This reduces the size of the memory required to implement the BF and increases its speed. The power consumption required for BF operations is also reduced. The benefits of the proposed technique have been illustrated through a case study which shows the benefits for a practical implementation. Both the proposed technique and a SEC code are unable to correct double errors. However, the proposed scheme could be either used with a SEC-DED code as discussed before or efficiently combined with a SEC code to protect also against double errors. More precisely, this can be done using a SEC code but only to detect errors. Since a SEC code has a minimum distance of three, it will always detect double errors. Then when an error is detected all the bits in the word are set to one. The study of this extension of the scheme to protect against multiple errors is left for future work.

84

[12]

[13]

[14]

[15]

[16]

[17]

[18] [19]

[20]

B. Bloom. “Space/Time Tradeoffs in Hash Coding with Allowable Errors”, Communication of ACM, vol. 13, no. 7, pp. 422-426, 1970. C. Fay, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: A distributed storage system for structured data,” ACM Transactions on Computer Systems (TOCS), vol. 26, no. 2, 2008. A. Moshovos, G. Memik, B. Falsa, and A. Choudhary, “Jetty: Filtering snoops for reduced energy consumption in SMP servers,” in Proc. of the Annual International Conference on. High-Performance Computing Architecture, pp. 85–96, Feb. 2001. A. Broder and M. Mitzenmacher. “Network applications of Bloom Filters: A survey”, in Proc. of the 40th Annual Allerton Conference, October 2002. S. Elham, A. Moshovos, and A. Veneris, “L-CBF: a low-power, fast counting Bloom filter architecture,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 6, pp. 628-638, 2008. T. Kocak and I. Kaya, ‘‘Low-power Bloom filter architecture for deep packet inspection’’, IEEE Commun. Lett, vol. 10, no. 3, pp. 210-212, 2006. N. Kanekawa, E. H. Ibe, T. Suga and Y. Uematsu, “Dependability in Electronic Systems: Mitigation of Hardware Failures, Soft Errors, and Electro-Magnetic Disturbances”, Springer Verlag, 2010. C. L. Chen and M. Y. Hsiao, “Error-correcting codes for semiconductor memory applications: a state-of-the-art review”, IBM J. of Research and Development, vol. 28, no. 2, pp. 124-134, 1984. C. Hermsmeyer, et al. ‘‘Towards 100G packet processing: Challenges and technologies’’, Bell Labs Technical Journal vol. 14 no. 2 pp. 57-79, 2009. M.-H. Lee; Y.-H. Choi, "A Fault-Tolerant Bloom Filter for Deep Packet Inspection," in Proc. of 13th Pacific Rim International Symposium on Dependable Computing, pp.389,396, 17-19 Dec. 2007. G. Wang, W. Gong, and R. Kastner, “On the Use of Bloom Filters for Defect Maps in Nanocomputing”, in Proc. of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 743746, 2006. S. Pontarelli and M. Ottavi, “Error Detection and Correction in Content Addressable Memories by Using Bloom Filters”, IEEE Trans. on Computers, vol. 62, no. 6, pp. 1111-1126, June 2013. J.Y. Choi and Y.-H. Choi, “Fault Detection of Bloom Filters for Defect Maps”, in Proc of the IEEE International Symposium on Defect and Fault Tolerance of Very Large Scale Integration (VLSI) Systems, pp.229,235, Oct. 2008, M. Mitzenmacher and G. Varghese “Biff (Bloom Filter) codes: Fast Error Correction for Large Data Sets”, in Proc of the IEEE International Symposium on Information Theory (ISIT), 2012. P. Reviriego, S. Pontarelli, J.A. Maestro, M. Ottavi “A Synergetic Use of Bloom Filters for Error Detection and Correction”, IEEE Trans. on VLSI Systems (in press). L. L. Autran et al. "Soft-errors induced by terrestrial neutrons and natural alpha-particle emitters in advanced memory circuits at ground level", Microelectronics Reliability, vol. 50, no. 9, pp. 1822-1831, 2010. G. Bianchi, E. Boschi, S. Teofili, and B. Trammell, “Measurement data reduction through variation rate metering”, in Proc. of IEEE INFOCOM, 2010. CAIDA Anonymized Internet Traces 2012 Dataset. Available at http://www.caida.org/data/passive/passive_2012_dataset.xml. J. E. Stine et al., “FreePDK: An open-source variation-aware design kit,” in Proc. IEEE Int. Conf. Microelectronic Systems Education, (MSE’07), pp. 173–174, Jun. 2007. M. V. Ramakrishna, E. Fu, and E. Bahcekapili. “Efficient hardware hashing functions for high performance computers”, IEEE Trans. on Computers, vol. 46, no.12, pp. 1378-1381, 1997.

2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)