Composition Check Codes - IEEE Xplore

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 64, NO. 1, JANUARY 2018

249

Composition Check Codes Kees A. Schouhamer Immink, Fellow, IEEE, and Kui Cai Abstract— We present composition check codes for noisy storage and transmission channels with unknown gain and/or offset. In the proposed composition check code, like in systematic error correcting codes, the encoding of the main data into a constant composition code is completely avoided. To the main data, a coded label is appended that carries information regarding the composition vector of the main data. Slepian’s optimal detection technique of codewords that are taken from a constant composition code is applied for detection. A first Slepian detector detects the label and subsequently restores the composition vector of the main data. The composition vector, in turn, is used by a second Slepian detector to optimally detect the main data. We compute the redundancy and error performance of the new method, and results of computer simulations are presented. Index Terms— Constant composition code, permutation code, flash memory, optical recording.

I. I NTRODUCTION

T

HE receiver of a transmission or storage system is often ignorant of the exact value of the amplitude (gain) and/or offset (translation) of the received signal, which depend on the actual, time-varying, conditions of the channel. In wireless communications, for example, the amplitude of the received signal may vary rapidly due to multi-path propagation or due to obstacles affecting the wave propagation. In optical disc recording, both the gain and offset depend on the reflective index of the disc surface and the dimensions of the written features. Fingerprints on optical discs may result in rapid gain and offset variations of the retrieved signal. Assume the q-level pulse amplitude modulated (PAM) signal, x i , i = 1, 2, . . . , is sent, and received as ri , where ri = a(x i + νi ) + b. The reals a > 0 and b are called the gain and offset of the received signal, respectively, and we assume that the receiver is ignorant of the actual values of a and b. The stochastic component is called ‘noise’ and is denoted by νi . We further assume that the parameters a and b vary slowly over time or position, so that for a plurality of n, n > 1, symbol time slots the parameters a and b can be considered fixed, but unknown to the receiver. The receiver’s ignorance of the exact value of a and b may seriously degrade the error performance of the transmission or storage channel, as has been shown in [1].

Manuscript received July 20, 2016; revised March 4, 2017; accepted June 18, 2017. Date of publication June 29, 2017; date of current version December 20, 2017. This work was supported by the Singapore Agency of Science and Technology (A*Star) Public Sector Research Funding (PSF) Grant. K. A. Schouhamer Immink is with the Turing Machines Inc., 3016 DK Rotterdam, The Netherlands (e-mail: immink@turing-machines.com). K. Cai is with the Singapore University of Technology and Design, Singapore 487372 (e-mail: cai_kui@sutd.edu.sg). Communicated by the Y. Mao, Associate Editor for Coding Techniques. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIT.2017.2721423

There are a myriad of proposals to handle the problem of the channel’s unknown gain and offset. Automatic gain control (AGC) has been applied in many practical transmission systems, but an automatic gain control (AGC) is close to useless if the gain and offset vary very rapidly. Redundant training sequences or reference memory cells with prescribed levels are placed between ‘user’ data for estimating the unknown parameters. The parameter estimation will, by necessity, be based on an average over a limited time-interval, and the estimated values may be inaccurate as they lag behind the actual values. A more frequent insertion of reference cells may improve the parameter estimation, which, however, comes at the cost of higher redundancy and thus decreased payload. Slepian showed in his seminal paper [2] that the error performance of optimal detection of codewords that are drawn from a single constant composition code is immune to gain and offset mismatch. He also presented an implementation of optimal detection whose complexity grows with n log n. A constant composition code of n-length codewords over the q-ary alphabet has the property that the numbers of occurrences of the symbols within a codeword is the same for each codeword [3]. In practice, however, Slepian’s detection method has limited applicability as it depends heavily on the efficient and simple encoding and decoding of arbitrary user data into a constant composition code. Encoding and decoding of constant composition code is a field of active research, see, for example [4]–[6]. For the binary case, Weber and Immink [7] and Skachek and Immink [8] presented methods that translate arbitrary data into a codeword having a prescribed number of one’s and zero’s. Enumerative methods for generating codewords have been presented in [9]–[11]. A serious drawback of enumeration schemes is error propagation, a phenomenon illustrated in Section VII. The lack of simple and efficient encoding and decoding schemes has been a major barrier for the application of Slepian’s optimal detection method. Thus, an efficient technique to eliminate, or at least significantly alleviate, the drawbacks and deficiencies of Slepian’s prior art system has been a desideratum. The scheme proposed and analyzed here, coined composition check code, meets the above desideratum as it has the virtues of Slepian’s optimal detection method, but its drawback, the encoding of the main data, or payload symbols, into a constant composition code, is removed. In the proposed scheme, the main data are sent to the receiver without modification. Attached to the main data word is a relatively short, fixed-length, label that informs the receiver regarding the constant composition code to which the sent main data word belongs. The information conveyed by the label is used by the receiver to optimally recover the main data using Slepian’s optimal detection method. The proposed method is reminiscent of a systematic error correcting code,

0018-9448 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

250


where unmodified main data is sent, and a parity check word is appended to make error correction or detection possible. A system using a variant of the proposed scheme was discussed recently by Li et al. [12]. In Li’s method, however, the label is not encoded into a constant composition code, and hence this portion is not immune to the unknown gain and offset. Also, Li’s system needs two detectors for the payload portion and the label portion (i.e. the conventional threshold detector and the Slepian detector), which all but doubles the detector complexity. The proposed method is also reminiscent of Knuth’s method [13] for generating codewords having equal numbers of one’s and zero’s, where an appended prefix carries information regarding the specific segment of the codeword that has been modified. It should be noted that the proposed technique has the principal virtues of Slepian’s prior art method, such as enabling simple optimal detection of the noisy codewords and immunity to the channel’s gain and offset mismatch. However, the generated codewords do not belong to a prescribed constant composition code, and therefore they do not possess the spectral properties, specifically reduced power at the low-frequency end, of codewords that are drawn from a constant composition code. A second advantage of the new scheme was noted in [12]. Since the payload is “systematic”, the payload can be protected by a conventional error-correcting code. Much stronger error correcting schemes (ECCs) are known for conventional channels than those that are subsets of constant-composition codes. The paper is organized as follows. In Section II, we set the scene, introduce preliminaries and discuss the state of the art. In Section III, we present our approach. In Section IV we compute the redundancy of the proposed method. Complexity issues are dealt with in Section V. In Sections VI and VII, we analyze and compute the error performance. In Section VIII, we describe our conclusions.

(2) w j = |{i : x i = j }| for j = 0, 1, . . . , q − 1. Clearly, w j = n and w j ∈ {0, . . . , n}. A constant composition code comprising all possible n-vectors with the same composition vector w(x), is denoted by Sw . Evidently, every codeword has w j occurrences of symbol j ∈ Q. The code Sw consists of all permutations of the symbols defined by the composition vector w, so that the size of Sw equals the multinomial coefficient n! . (3) |Sw | = i∈Q wi ! A constant composition code is also known as permutation modulation code (Variant I), which was introduced by Slepian [2] in 1965. Slepian showed that a constant composition code allows optimal detection using a simple algorithm. B. Slepian’s Algorithm The well-known (squared) Euclidean distance, δe (r, xˆ ), between the received signal vector r and the codeword xˆ ∈ Sw is defined by δe (r, xˆ ) =

n

(ri − xî )2 .

(4)

i=1

A minimum Euclidean distance detector outputs the codeword x o defined by x o = arg min δe (r, xˆ ). xˆ ∈Sw Working out (4) gives δe (r, xˆ ) =

n i=1

II. P RELIMINARIES

(x i + b)2 − 2

n

x i xî − 2b

i=1

(5)

n i=1

xî +

n

xî2 ,

i=1

(6)

We assume that user data is recorded in groups of n q-level symbols, called a codeword. We consider a codebook, Sw , of chosen codewords x = (x 1 , x 2 , . . . , x n ) over the q-ary alphabet Q = {0, . . . , q − 1}, where n, the length of x, is a positive integer. In line with the adopted linear channel model, we assume that the codeword, x, is retrieved as r = a(x + ν) + b1,

of occurrences of the symbol x i = j ∈ Q, 1 ≤ i ≤ n, in x. That is, for a q-ary sequence x, we denote the number of appearances of the symbol j by

(1)

where r = (r1 , . . . , rn ), ri ∈ R, and 1 = (1, . . . , 1). The basic premises are that x is retrieved with an unknown (positive) gain a, a > 0, is offsetted by an unknown uniform offset, b1, where a and b ∈ R, and corrupted by additive Gaussian noise ν = (ν1 , . . . , νn ), νi ∈ R are noise samples with distribution N (0, σ 2 ), where σ 2 ∈ R denotes the variance of the additive noise. A. Constant Composition Codes Define the composition vector w(x) = (w0 , . . . , wq−1 ) of x, where the q entries w j , j ∈ Q, of w(x) indicate the number

where x i = a(x i + νi ). Evidently, the Euclidean distance δe (r, xˆ ) depends on the quantities a and b, which may lead to a serious degradation of the error performance [1]. The first term of (6), ni=1 (x i + b)2 , is independent of xˆ , and clearly dropping the constant term does not effect the outcome a similar fashion, we can drop the quantities of (5). In 2b ni=1 xî and ni=1 xî2 , since the vector xˆ is drawn from a constant composition code so that both quantities are constant for all xˆ ∈ Sw . Then we find n δe (r, xˆ ) ≡ − ri xî , (7) i=1

where the sign ≡ denotes equivalence between (4) and (7) since the outcome of (5) is the same when (7) is used instead of (4). Thus the channel’s unknown gain, a, and offset, b, do not affect the outcome of (5) when codewords are drawn from a constant composition code Sw . We now address the efficient elaboration of the inner product (7) using Slepian’s algorithm.

SCHOUHAMER IMMINK AND CAI: COMPOSITION CHECK CODES

251

Slepian [2] showed that the minimization (5) can be replaced by a simple sorting of the symbols of the received signal vector r. He proved that for two given vectors, (xˆ1 , . . . , xˆn ) and (r1 , . . . , rn ) that the inner product (7) r1 xî1 + r2 xî2 + . . . + rn xîn

(8)

is maximized under all permutations i 1 , i 2 , . . . , i n of the integers 1, 2, . . . , n, by pairing the largest xî with the largest ri , the second largest xî with the second largest ri , etc. To that end, the n elements of the received vector, r, are sorted from largest to smallest. From the composition vector of the codeword, x, at hand, w, we deduce the reference vector x r = (q−1, . . . q−1, q−2, . . . , q−2, . . . , 0, . . . , 0), where the symbols are sorted from largest to smallest, and the numbers of q − 1’s, q − 2’s and so on in x r equal wq−1 , wq−2 , . . ., w0 . Slepian’s algorithm is attractive since the complexity of sorting n symbols grows with n log n, which is far less complex than the evaluation of (5) whose complexity grows exponentially with n. A small example may clarify Slepian’s algorithm. Example 1: Let n = 5, q = 3, and let the composition vector be w = (1, 2, 2). Thus each sent codeword is a permutation of the reference vector x r = (2, 2, 1, 1, 0), where the symbols of the reference vector have been sorted largest to smallest. Let the received vector be r = (0.2, 1.4, 0.9, 1.2, 1.6). We sort the symbols in the received vector r in decreasing order and obtain (1.6, 1.4, 1.2, 0.9, 0.2). Then the detector assigns the symbols by pairing the two largest symbols of x r and r, that is, the symbol valued 1.6 to a ‘2’, then 1.4 to ‘2’, 1.2 to a ‘1’, 0.9 to a ‘1’, and finally the symbol valued 0.2 to a ‘0’. The detector decides that the codeword (0, 2, 1, 1, 2) was sent. III. C OMPOSITION C HECK C ODES A drawback of the usage of a constant composition code in Slepian’s prior art is the complexity of the encoding and decoding operation in case the payload is large. Encoding algorithms, such as enumerative encoding [14], [15], require much smaller look-up tables than direct look-up tables, but they often require complex look-up tables and algorithms. In the proposed composition check code, the encoding of arbitrary data into a constant composition code is avoided. The n-symbol main data word, x, is sent without any modification, and a separate p-symbol label, denoted by z = (z 1 , . . . , z p ), z i ∈ Q, is appended to the main data word. The appended p-symbol label, z, informs the Slepian detector to which constant composition code the main data word, x, belongs. To that end, we define a one-to-one correspondence between the set of all possible composition vectors of the n-symbol payload and the set of p-symbol labels. The number of possible distinct constant composition vectors, denoted by N(q, n), of a q-ary n-vector equals [16, p. 38] n+q −1 N(q, n) = . (9) q −1 The length of the label, p, must be chosen sufficiently large so that the label can uniquely convey the identity of the constant composition code. In the binary case, the encoded label represents the number of one’s in the main data word.

The procedure for encoding and decoding is succinctly written as follows. A. Encoding/Decoding The main (user) data, denoted by x, which consists of n q-ary symbols, is transferred to the encoder. The encoder first forms the composition vector w = (w0 , . . . , wq−1 ) of x using (2), and translates the vector w into the p-symbol q-ary label, z, using a predefined one-to-one correspondence, z = φ(w). The label, z, is appended to the main data, and the main data plus the label are sent. The one-to-one correspondence z = φ(w) can be simply embodied by a look-up table for small values of q and n. In practice, for larger values of n and q, the function z = φ(w) is a twostep process, where z = φ(w) is partitioned into a cascade of two functions, I = φ1 (w) and z = φ2 (I ), where I is a nonnegative integer. In the first step, the (compression) function, I = φ1 (w), translates the composition vector w into an integer in the range 0 to at most (n + 1)q −1. The vector w is q−1 redundant since we have the constraint i=0 wi = n. In case the composition vector w is ideally compressed, the integer I ranges from 0 to N(q, n) − 1. In the second step, the function, z = φ2 (I ), translates the integer I into the p-symbol q-ary label. Practical issues regarding the implementation of the functions I = φ1 (w) and z = φ2 (I ) for larger values of n and q are given in Section V. Note that the sent concatenation of x and z does not have special spectral characteristics, it is not ‘balanced’ or ‘dc-free’. The label, z, is detected, preferably using Slepian’s optimal method, decoded by a look-up table, φ −1 (z), and the composition vector w = φ −1 (z), is retrieved. Following Slepian’s method, see Subsection II-B, the received main data symbols are sorted and assigned to the symbols in accordance with the retrieved composition vector w. It is sufficient to uniquely encode the N(q, n) different composition vectors into p = logq N(q, n) label symbols, but in a preferred embodiment, the label is a codeword taken from a predefined p-symbol constant composition code. The preferred embodiment has the advantage that firstly, Slepian’s optimal method is used for both the main data word and the label giving them both a high resilience to additive noise, and, secondly, both the main data word and the label are immune to channel mismatch. These attractive virtues come at a price, and in the next section, we compute the redundancy of composition check codes. IV. R EDUNDANCY A NALYSIS We discuss two label formatting options, where a) the label is uncoded as in [12], and b) the label is encoded using a constant composition code. The p-symbol label must be able to uniquely represent all distinct composition vectors, N(q, n), of the n-symbol payload. Thus, for an uncoded label, we find the condition p ≥ logq N(q, n).

(10)

252


For asymptotically large n and limited q, we obtain using Stirling’s Approximation for a binomial coefficient n+q −1 1 n q−1 , n 1, (11) N(q, n) = ≈ q−1 (q − 1)! so that the code redundancy, p, equals p ≈ (q − 1) logq n − logq (q − 1)!.

(12)

In case the p-symbol label is encoded into a q-ary constant composition code, we have the condition p! ≥ N(q, n), ˆ i! i∈Q w where w ˆ denotes the composition vector of the p-symbol label. The number of labels is maximized if we choose p = aq, and wˆ i = a, ∀a ∈ Q. Then the label length, p, must be sufficiently large to satisfy p! ≥ N(q, n). (13) ( qp !)q Since, using Stirling’s Approximation, p! qp p q ≈ αq (q−1)/2 , ( q !) p

(14)

where αq =

q (q/2) , (2π)(q−1)/2

we have αq or logq αq −

qp p(q−1)/2

≥

1 n q−1 , (q − 1)!

p, n 1,

(15)

q −1 logq p + p ≥ (q − 1) logq n − logq (q − 1)!. 2 (16)

V. C OMPLEXITY I SSUES For relatively small n and q, the composition vector w can be straightforwardly translated into a p-symbol q-ary label z by using a look-up table that embodies the one-to-one correspondence z = φ(w). We infer from (11) that, although N(q, n) grows polynomially with the codeword length, n, that for larger alphabet size q the number of entries of a look-up table can be prohibitively large. For a practical application, we must try to find an algorithmic routine in lieu of look-up tables. We present two alternative scenario’s. We encode (compress) the composition vector w using an algorithmic (enumeration) approach. Alternatively, we do not compress the composition vector w, and we compute the redundancy loss. We commence, in the next subsection, with the compression of the composition vector, w, using Cover’s enumerative coding techniques [17]. A. Compressed Composition Vector, Enumerative Encoding of the Composition Vector The translation function, I = φ1 (w), of the composition vector, w, into an integer I , 0 ≤ I ≤ N(q, n) − 1, can be accomplished using enumerative encoding. In an enumerative coding scheme, the codewords are ranked in lexicographical order [17]. The lexicographical index, or rank, I , of a codeword, x, in the ordered list equals the number of codewords preceding x in the ordered list. Using the findings of [17], we write down the next Theorem. Theorem 1: q−1 xi−1 −1 n − j + q − i − 1 , (20) I = q −i −1 i=1

where n = n −

For asymptotically large n, we have the estimate of the redundancy p > (q − 1) logq n. For q = 2 we simply find p ≥ n + 1, 2p

(17)

(18)

which is about the required redundancy of Knuth’s code for balancing binary sequences. The redundancy, rs , of Slepian’s prior art method, where the payload is translated into a constant composition code where all symbols appear with frequency nq , is ⎛ ⎞ qn q−1 rs = logq ⎝ n! ⎠ ≈ logq n − logq αq . (19) 2 n n ( q !)

A comparison with (17) reveals that for large n the redundancy of the proposed scheme is approximately a factor of two more than can be obtained by the conventional method using a fixed constant composition code. Apparently, this is the price to pay for a simple implementation. A variable-length label that takes into account the probability of occurrence of the label instead of the fixed-length label studied here, will reduce the required redundancy of the method [7].

j =0

i−1

x i −1 ,

i =1

and I ∈ {0 . . . N(q, n) − 1}. Proof: We follow Cover’s approach [17]. Let n s (w0 , w1 , . . . , wk−1 ) denote the number of composition vectors for which the first k coordinates are given by (w0 , w1 , . . . , wk−1 ). According to Cover, the lexicographic index, I , is given by I =

q−1 x i−1 −1 i=1

n s (w0 , w1 , . . . , wi−2 , j ).

(21)

j =0

We have

n +q −i −1 n s (w0 , w1 , . . . , wi−1 ) = , q −i −1

where n = n −

i

x i −1 .

i =1

Substitution yields (20), which concludes the proof. The inverse function, w = φ1−1 (I ), is also calculated using an algorithmic approach, and we refer to [17] for details. The binomial coefficients can be computed on the fly, and look-up tables are not required.

SCHOUHAMER IMMINK AND CAI: COMPOSITION CHECK CODES

253

B. Uncompressed Composition Vector Alternatively, we investigate the case that the vector, w, is not compressed. The q entries wi of the composition vector w are in the alphabet {0, . . . , n}, so that the composition vector w can be seen as a positive integer number of q (n + 1)-ary digits. We may slightly compress the vector w by noting that the observation of q − 1 entries uniquely identifies w since q−1 i=0 wi = n. We study the increase of the redundancy as the label must be able to accommodate (n + 1)q different integer numbers that are associated with the uncompressed w. To that end, let p denote the length of the label. In case the label is uncoded, the vector w is translated into the q-ary p -symbol label using a well-known base conversion algorithm [18]. We have p ≥ qlogq (n + 1).

(22)

The relative increase in redundancy with respect to the compressed vector, w, is defined by η=

p − p . p

VI. E RROR P ERFORMANCE A NALYSIS (23)

Then η=

qlogq (n + 1) − (q − 1) logq n − logq (q − 1)! (q − 1) logq n − logq (q − 1)!

. (24)

For asymptotically large n, we find η≈

1 , n 1, q

(25)

and we conclude that the relative increase in redundancy by using uncompressed composition vector, w, is inversely proportional with q. We proceed and take a look at the redundancy of the coded label. The algorithmic encoding and decoding of an integer number in any base into a codeword of symbols in any base of a constant composition code using enumerative encoding has been published extensively in the literature, see for example [5]. The coded label length, p , must be sufficiently large to satisfy (see (13) and (14)) p! ≥ (n + 1)q , ( qp !)q which, for large n, can be approximated by αq

qp

p(q−1)/2

≥ (n + 1)q , n 1,

or q −1 logq p + p ≥ q logq (n + 1). 2 For asymptotically large n, we find logq αq −

p > q logq (n + 1).

(26)

The relative extra redundancy required by the unconstrained algorithmic encoding of the composite vector w equals η≈

q logq (n + 1) − (q − 1) logq n q logq (n + 1)

We infer that the relative extra redundancy for the method that employs traditional enumerative algorithmic encoding equals q1 . For small values of q we may, dependent of the codeword length n, apply look-up tables for encoding the label, while for larger q we may employ algorithmic encoding without significant loss in redundancy. The next example shows numerical results. Example 2: Let q = 3 and n = 64. From (9), we find that the number of distinct constant composition vectors equals N(q, n) = 2145. The N(q, n) = 2145 vectors can be encoded into a ternary label taken from a constant composition code of length 10. In case the label is not a member of a specified constant composition code, the label length can be slightly shorter, namely log3 2145 = 7. In case we do not compress the composition vector, we require a look-up table of (n + 1)2 = 65 × 65 = 4225 entries. The 4225 entries can be encoded into a specified constant composition code of length 11 or, alterna tively, into an uncoded label of length log3 4225 = 8.

≈

1 , n 1. (27) q

Decoding is a two-step process: first the label is detected, and subsequently the payload is retrieved by using the data conveyed by the label. Clearly, the payload is received in error if the p-symbol label is received in error, or, in case the label is correctly received, the payload itself is received in error by the Slepian detector. We concentrate here on the block error rate of the outputted payload. The p-symbol label is drawn from a fixed constant composition code, while the n-symbol payload is a member of a constant composition code (not necessarily the same code as that of the label), which may be different for each source word. We start by computing the error performance of a, given, constant composition code. To that end, let the codeword x be a codeword taken from the constant composition code Sw . The word error rate (WER) averaged over all words x ∈ Sw is upperbounded (union bound) by δe (x, xˆ ) 1 Q , (28) WER < |Sw | 2σ x ∈Sw xˆ = x where the Q-function is defined by ∞ u2 1 e− 2 du. (29) Q(x) = √ 2π x Note that the error performance of the proposed method is invariant to unknown gain, a, and offset, b, see (7), and that, obviously, these parameters are not present in the word error rate (28). For asymptotically large signal-to-noiseratio’s (SNR), i.e. for σ 264 . that 68 is the smallest even integer m for which m/2 Figure 2 shows the bit error rate (BER) of a) the prior art using Schalkwijk’s enumeration scheme, and b) the new

[1] K. A. S. Immink and J. H. Weber, “Minimum pearson distance detection for multilevel channels with gain and/or offset mismatch,” IEEE Trans. Inf. Theory, vol. IT-60, no. 10, pp. 5966–5974, Oct. 2014. [2] D. Slepian, “Permutation modulation,” Proc. IEEE, vol. 53, no. 3, pp. 228–236, Mar. 1965. [3] W. Chu, C. J. Colbourn, and P. Dukes, “On constant composition codes,” Discrete Appl. Math., vol. 154, no. 6, pp. 912–929, Apr. 2006. [4] W. Ryan and S. Lin, Channel Codes: Classical and Modern. Cambridge, U.K.: Cambridge Univ. Press, 2009. [5] S. Datta and S. W. McLaughlin, “An enumerative method for runlengthlimited codes: Permutation codes,” IEEE Trans. Inf. Theory, vol. IT-45, no. 6, pp. 2199–2204, Sep. 1999. [6] D. Pelusi, S. Elmougy, L. G. Tallini, and B. Bose, “m-ary balanced codes with parallel decoding,” IEEE Trans. Inf. Theory, vol. IT-61, no. 6, pp. 3251–3264, Jun. 2015. [7] J. H. Weber and K. A. S. Immink, “Knuth’s balanced codes revisited,” IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1673–1679, Apr. 2010. [8] V. Skachek and K. A. S. Immink, “Constant weight codes: An approach based on Knuth’s balancing method,” IEEE J. Sel. Areas Commun., vol. 32, no. 5, pp. 908–918, May 2014. [9] R. M. Capocelli, “Efficient q-ary immutable codes,” Discrete Appl. Math., vol. 33, pp. 25–41, Nov. 1991. [10] L. G. Tallini and U. Vaccaro, “Efficient m-ary balanced codes,” Discrete Appl. Math., vol. 92, no. 1, pp. 17–56, 1999. [11] T. G. Swart and J. H. Weber, “Efficient balancing of q-ary sequences with parallel decoding,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), vol. 3. Seoul, South Korea, Jun./Jul. 2009, pp. 1564–1568. [12] Y. Li, E. E. Gad, A. A. Jiang, and J. Bruck, “Data archiving in 1x-nm NAND flash memories: Enabling long-term storage using rank modulation and scrubbing,” in Proc. IEEE Int. Rel. Phys. Symp., Apr. 2016, pp. 6C-6-1–6C-6-10. [13] D. E. Knuth, “Efficient balanced codes,” IEEE Trans. Inf. Theory, vol. IT-32, no. 1, pp. 51–53, Jan. 1986. [14] O. Milenkovic and B. Vasic, “Permutation (d, k) codes: Efficient enumerative coding and phrase length distribution shaping,” IEEE Trans. Inf. Theory, vol. IT-46, no. 7, pp. 2671–2675, Nov. 2000. [15] J. Schalkwijk, “An algorithm for source coding,” IEEE Trans. Inf. Theory, vol. IT-18, no. 3, pp. 395–399, May 1972. [16] W. Feller, An Introduction to Probability Theory and Its Applications, vol. 1. New York, NY, USA: Wiley, 1950.

256


[17] T. M. Cover, “Enumerative source encoding,” IEEE Trans. Inf. Theory, vol. IT-19, no. 1, pp. 73–77, Jan. 1973. [18] D. E. Knuth, “Positional number systems,” in The Art of Computer Programming: Seminumerical Algorithms, vol. 2, 3rd ed. Reading, MA, USA: Addison-Wesley, 1998, pp. 195–213. [19] G. Forney, Jr., “Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference,” IEEE Trans. Inf. Theory, vol. IT-18, no. 3, pp. 363–378, May 1972. [20] J. Riordan, Introduction to Combinatorial Analysis. Princeton, NJ, USA: Princeton Univ. Press, 1980. [21] K. A. S. Immink and A. J. E. M. Janssen, “Error propagation assessment of enumerative coding schemes,” IEEE Trans. Inf. Theory, vol. IT-45, no. 7, pp. 2591–2594, Nov. 1999.

Kees A. Schouhamer Immink (M’81–SM’86–F’90) received his PhD degree from the Eindhoven University of Technology. He was from 1994 till 2014 an adjunct professor at the Institute for Experimental Mathematics, Essen, Germany. In 1998, he founded Turing Machines Inc., an innovative startup focused on novel signal processing for hard disk drives and solid-state (Flash) memories. He received a Knighthood in 2000, a personal Emmy award in 2004, the 2017 IEEE Medal of Honor, the 1999 AES Gold Medal, the 2004 SMPTE Progress Medal, and the 2015 IET Faraday Medal. He received the Golden Jubilee Award for Technological Innovation by the IEEE Information Theory Society in 1998. He was elected into the (US) National Academy of Engineering. He received an honorary doctorate from the University of Johannesburg in 2014.

Kui Cai received B.E. degree in information and control engineering from Shanghai Jiao Tong University, Shanghai, China, M.Eng degree in electrical engineering from National University of Singapore, and joint Ph.D. degree in electrical engineering from Technical University of Eindhoven, The Netherlands, and National University of Singapore. Currently, she is an Associate Professor with Singapore University of Technology and Design (SUTD). She received 2008 IEEE Communications Society Best Paper Award in Coding and Signal Processing for Data Storage. She served as the Vice-Chair (Academia) of IEEE Communications Society, Data Storage Technical Committee (DSTC) during 2015 and 2016. Her main research interests are in the areas of coding theory, information theory, and signal processing for various data storage systems and digital communications.