Integer Sequences Related to Chemistry N. J. A. Sloanea and Parthasarathy Nambib
a
AT&T Shannon Lab, Florham Park, NJ 07932 USA (
[email protected])
b
Bellevue Community College, Bellevue, WA 98004 USA (
[email protected])
1
ABSTRACT
The aim of this poster is to inform ACS members about the OnLine Encyclopedia of Integer Sequences (or OEIS) [1], which is a freely accessible database containing information about 120,000 number sequences. This poster will describe just a few of the many entries that are of interest to chemists.
2
1. The OEIS
Entries in the On-Line Encyclopedia of Integer Sequences (or OEIS) give the first 100 (or sometimes 10,000) terms of the sequences, their definitions, formulas, computer programs to generate them, references to the literature and to the Internet, graphs and other illustrations. Each sequence has a unique identification number: e.g. A000055 gives the number of trees on n nodes. If you come across a sequence in your work (1, 3, 17, 40, 102, . . . , say) and wish to identify it, the OEIS is the place to look. The OEIS is maintained by N.J.A.S. and has been in existence in various forms for over 40 years. The web site is used by thousands of people each day. New sequences are added at a rate of over 12,000 each year. Only a selection of some sequences of interest to chemists will be mentioned here. Our convention is that a(n) usually denotes the nth term of the sequence under discussion, and we will display the first ten or so terms a(1), a(2), a(3), . . . , although of course the OEIS gives many more terms.
3
2. Three Basic Sequences
A000055: the number of trees on n nodes (see Fig. 1): 1, 1, 1, 2, 3, 6, 11, 23, 47, 106, 235, 551, 1301, 3159, . . . . A000088: the number of graphs on n nodes (see Fig. 2): 1, 2, 4, 11, 34, 156, 1044, 12346, 274668, 12005168, . . . . (The OEIS contains literally hundreds of sequences related to those two.)
A000108: the Catalan numbers, C(n) =
2n 1 2n+1 n
:
1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, . . . . This is the single most frequently looked-up sequence in the OEIS, with over a hundred different interpretations. For example, C(n) is the number of ways to insert n pairs of parentheses in a string of n + 1 letters. E.g. for n = 3 there are five ways: ((AB)(CD)), (((AB)C)D), ((A(BC))D), (A((BC)D)), (A(B(CD))).
4
Figure 1: The three trees with 5 nodes: illustrating a(5) = 3 in A000055
5
Figure 2: The four graphs with 3 nodes: illustrating a(3) = 4 in A000088
6
3. Sequences from the Beginning of the Graphical Notation The graphical notation for chemical structures began in the middle of the nineteenth century with the work of Crum Brown. This led Arthur Cayley to initiate graph theory as a part of mathematics. The following four sequences are typical of those studied by Cayley around 1874 [2], [3]. In [2](b), Cayley’s aim was to determine the number of alkanes with structure Cn H2n+2 , ignoring stereoisomers. If the hydrogen atoms are ignored we get an n-node unlabeled tree in which every node has degree at most 4. Ignoring stereoisomers means that the children of a node are unordered. Cayley divided these trees into two classes, those with a unique central node (“centered” trees) and those with two equally central nodes (“bicentered” trees). The corresponding sequences are A000022, A000200 and A000602 (see Fig. 3): n 1 2 3 4 5 6 7
8
9 10
11
12
13
14
15 . . .
centered 1 0 1 1 2 2 6
9 20 37
86 181 422
943 2223 . . .
bicentered 0 1 0 1 1 3 3
9 15 38
73 174 380
915 2124 . . .
alkanes 1 1 1 2 3 5 9 18 35 75 159 355 802 1858 4347 . . . In fact Cayley’s calculations were incorrect above n=11, and the above values are taken from [3], where P´olya theory was used to determine exact formulas for these sequences. 7
Figure 3: Illustration of initial terms of A000022, A000200 and A000602, from Henry Bottomley
8
In [2](a), Cayley attempted to determine the number of alkyl radicals with structure Cn H2n+1 , again ignoring stereoisomers. This is sequence A000598: 1, 1, 2, 4, 8, 17, 39, 89, 211, 507, 1238, 3057, 7639, 19241, . . . . (again Cayley’s result were slightly incorrect).
9
4. The 1930’s: Henze and Blair
In the 1930’s, Henry Henze and Charles Blair, chemists at the Univ. of Texas, wrote a long series of papers in J. Amer. Chem. Soc. [4], [5], [6] in which they found recurrence relations for the numbers of isomers in various families of chemical compounds. For example, they gave recurrences relations for Cayley’s sequences A000602 and A000598 mentioned above, as well as for the analogous sequences when stereoisomers are taken into account. Thus we have A000628 (alkanes Cn H2n+2 , taking stereoisomers into account): 1, 1, 1, 2, 3, 5, 11, 24, 55, 136, 345, 900, 2412, 6563, . . . , and A000625 (alkyl radicals Cn H2n+1 , taking stereoisomers into account): 1, 1, 2, 5, 11, 28, 74, 199, 551, 1553, 4436, 12832, . . . . Henze and Blair’s recurrences are usually extremely complicated, and it has recently been discovered that there are occasional errors in their tables. Of course their calculations were carried out by hand. P´olya’s approach, described in the next section, proved to be—once the necessary mathematical machinery had been developed—a much simpler and more powerful method.
10
5. P´ olya’s Enumeration Theory
A major breakthrough occurred around 1936 when the mathematician George P´olya published two papers [7] in which he developed a general theory of enumeration, applicable to a wide range of problems in chemistry and combinatorics. His work had been anticipated in part by Redfield [9], but nevertheless this method is known today as “P´olya’s Enumeration Theory”. The idea is to use the representation theory of permutation groups and their “cycle indices” to obtain a far-reaching generalization of “Burnside’s Lemma”. This lemma states that if a set S of objects is permuted by a group G of permutations, then the number of equivalence classes (“orbits”) of objects in S is equal to the average number of objects that are fixed under each permutation in G. For an excellent introduction to P´olya theory, see the article by de Bruijn [10]. In his two papers, P´olya gives a large number of examples from chemistry. In particular, he gives generating functions for several of the sequences mentioned above. To illustrate, here is the beginning of Table I from [7](a), with the names translated from German (and of course with the sequence numbers added). It is interesting to note that he thanks his nephew, the chemical engineer J. P´olya, for collaborating in producing the tables.
11
Table 1: Number of structural isomers for homology series and alkyl-derivatives, from P´olya, Zeit. f. Kristall., 93 (1936), 415–443
n
1 2
3
4
5
6
Name
Sequence
Cn H2n+2
1 1
1
2
3
5
Paraffins
A000602
Cn H2n+1 X
1 1
2
4
8
17
Alkyls
A000598
Cn H2n XY
1 2
5 12
31
30 Di-substitued paraffins A000635
Cn H2n X2
1 2
4
21
52 Di-substitued paraffins A000636
9
Cn H2n−1 XYZ 1 4 13 42 131 402 Tri-substitued paraffins A000640 Cn H2n−1 X2 Y 1 3 Cn H2n−1 X3 ···
9 27
81 240 Tri-substitued paraffins A022014
1 2
5 14
39 109 Tri-substitued paraffins A000641
.
.
.
.
.
.
12
···
···
6. Magic Numbers I: Polyhedral Clusters
In cluster science it is well-known that clusters containing certain special numbers of atoms occur more frequently than others. These special numbers are often called magic numbers, and a search on the Internet for “magic number” and “chemistry” will produce over 100,000 references. Many of these sequences of magic numbers are not well-defined mathematically. Probably the most famous example comes from physics: this is the sequence A018226 2, 8, 20, 28, 50, 82, 126, –atoms with one of these numbers of protons or neutrons in their nuclei are considered to be stable. In the 1980’s Boon Teo and N.J.A.S. carried out a systematic study of magic numbers from a geometrical point of view [11], [12], [13]. In [11](a) we analyze clusters that have the shape of one of the Platonic or Archimedean solids (extending earlier work by Buckmister Fuller, Coxeter and others).
13
Figure 4: Cluster in shape of truncated octahedron with each edge subdivided into two equal segments. This cluster contains G2 = 201 atoms, of which S2 = 122 lie on the surface (or skin)—from Teo and Sloane, Inorg. Chem., 24 (1985), 4545–4558
14
We provide explicit formulas for the analogous magic numbers when the edges of the polygons or polyhedra are subdivided into n equal parts. For the truncated octahedron the formulas for Sn , the number of surface points, and Gn , the total number of points (the magic numbers), are: Sn = 30n2 + 2 (n ≥ 1), Gn = 16n3 + 15n2 + 6n + 1 . Reference [11](a) contains a large number of such cluster sequences. The following table shows a small sample.
15
Table 2: Number of Surface Points Sn and Total Number of Points Gn for Various Archimedean and Other Figures (from Teo and Sloane, Inorg. Chem., 24 (1985), 4545–4558)
Polyhedron
n
Truncated
0
1
2
3
4
5
6
7
8
9
10 Sequence
Sn 1 16
58 128
226
352
506
688
898
1136
1402 A005905
tetrahedron Gn 1 16
68 180
375
676 1106 1688 2445
3400
4576 A005906
Cubocta-
Sn 1 12
42
92
162
252
362
642
812
1002 A005901
-hedron
Gn 1 13
55 147
309
561
923 1415 2057
2869
3871 A005902
Sn 1 32 122 272
482
752 1082 1472 1922
2432
3002 A005903
Truncated
492
octahedron Gn 1 38 201 586 1289 2406 4033 6266 9201 12934 17561 A005910 ···
.
.
.
.
.
.
16
.
.
.
.
.
.
···
7. Magic Numbers II: Close-Packed Spherical Clusters
In [11](b), [12], [13] Teo and N.J.A.S. give a similar analysis for the sizes of spherical clusters that can be found in the hexagonal lattice in the plane, the simple cubic, face-centered cubic (fcc), and body-centered cubic (bcc) lattices in three dimensions, as well as the hexagonal close-packing (hcp) and diamond structures. Generating functions for these cluster series are simply the appropriate “theta series” of the lattice with respect to the particular point used as the center of the cluster. Again some of these results were already known. For example, consider clusters centered at a lattice point in the fcc lattice. Let Gn (the magic numbers) denote the total number of atoms in the cluster of radius √
2n, and let Sn denote the number of atoms on the surface of the cluster. Then we
obtain the cluster series (A004015, A119869): Sn : 1, 12, 6, 24, 12, 24, 8, 48, 6, 36, 24, 24, 24, 72, 0, 48, . . . , Gn : 1, 13, 19, 43, 55, 79, 87, 135, 141, 177, 201, 225, 249, . . . . On the other hand, if the clusters are centered at an octahedral-shaped hole in the fcc lattice, the analogous cluster series are (A005887, A119874): Sn : 6, 8, 24, 0, 30, 24, 24, 0, 48, 24, 48, 0, 30, 32, 72, 0, 48, . . . ,
17
Gn : 6, 14, 38, 38, 68, 92, 116, 116, 164, 188, 236, 236, 266, . . . . The three references contain a large number of cluster series of this type.
18
8. Coordination Sequences
The cluster series described in the previous section give the numbers of atoms in balls of successive radii 0,
√ √ √ √ 2, 4, 6, 8, . . . (say) around the central point.
Alternatively, one may classify atoms according to the number of bonds in the shortest path to the central atom. The zeroth shell consists of a single atom, the number in the next shell is the conventional coordination number, and in general the nth shell consists of those atoms that are bonded to atoms in shell n−1 (and which have not already been counted). The sequence that gives the numbers of atoms in the successive shells is the coordination sequence of the structure, a term introduced in 1971 by Brunner and Laves [14]. Coordination sequences were originally used to investigate the topology of frameworks and to help specify the positions of atoms. They are now routinely used to characterize crystallographic structures. In [15], Ralf Grosse-Kunstleve, G. O. Brunner and N.J.A.S. computed the coordination sequences for all the zeolites in the Meier-Olson Atlas of Zeolite Structure Types. There are almost 400 sequences, all of which are now in the OEIS, often with recurrences and explicit formulas.
19
The following is a more typical, three-dimensional example. This coordination sequence arises in the zeolite structures AFG, CAN, LIO and LOS: (sequence A008013): 1, 4, 10, 20, 34, 54, 78, 104, 134, 168, 210, 256, 302, 352, . . . . The nth term, a(n), is given by five different expressions: a(5m) = 52m2 + 2 , a(5m + 1) = 52m2 + 22m + 4 , a(5m + 2) = 52m2 + 42m + 10 , a(5m + 3) = 52m2 + 62m + 20 , a(5m + 4) = 52m2 + 82m + 34 . This kind of formula is quite typical for coordination sequences of zeolites, although many of the formulas are much more complicated than this one. (Technically, formulas like this, where a(n) depends on the residue of n modulo some number, are called PORC functions, which stands for “Polynomial On Residue Classes”!)
20
Figure 5: Example: Coordination sequence of a two-dimensional net, with successive shells indicated by different colors. From R. W. Grosse-Kunstleve et al., Poster, Sixteenth European Crystallographic Meeting, Lund, Sweden, August 6–11, 1995
21
9. Number of Periodic Close Packings; Stacking Sequences
Maximally dense sphere packings can be constructed by stacking layers, where each layer consists of an infinite sheet of balls in the hexagonal lattice arrangement. Once one layer is in position—call it a type A layer—then there are three choices—call them A, B, C—for each of the remaining layers. Such packings are generally called Barlow packings. If the layers alternate . . .A, B, C, A, B, C, A,. . . with period three, we obtain the fcc lattice; whereas if they alternate . . .A, B, A, B, A, B, A,. . . with period two, we obtain the hcp structure. Many authors have studied enumeration problems arising from such packings. For example, T. J. McLarnan [16] used P´olya theory to find the number of Barlow packings in which the layers alternate with period exactly n (A011768): 0, 1, 1, 1, 2, 3, 6, 7, 16, 21, 43, 63, 129, 203, 404, 685, 1343, . . . . The entries a(2) = a(3) = 1 correspond to the hcp and fcc, respectively. In the same paper McLarnan enumerates many related structures. For example, the number of ZnS polytypes with period n (A011957) is: 0, 1, 1, 1, 1, 2, 3, 6, 10, 18, 31, 59, 105, 198, 365, 688, 1285, . . . . The entry a(3) = 1 corresponds to the cubic spharelite structure. 22
For other recent work on enumerating stacking sequences see Thompson and Downs [17], Lord et al. [18], Estevez-Rams et al. [19]. Stacking sequences are also studied extensively in metallurgy.
23
10. Benzenoids, Polyhexes, Catafusenes All the sequences mentioned so far have known generating functions, which make it possible to compute as many terms as one wishes. In contrast, the sequences here are typical of really hard combinatorial problems, where one cannot do much better than a brute force enumeration. A typical example of this type of problem is the enumeration of organic compounds built up from benzene rings. In mathematical terms we are asking for the number of different planar figures that can be built up from n hexagons. These are called hexagonal animals or polyhexes. With 1, 2 or 3 hexagons the numbers are respectively 1, 1 and 3, as shown in Fig. 6. The sequence of numbers of polyhexes (A000228) is: 1, 1, 3, 7, 22, 82, 333, 1448, 6572, 30490, 143552, 683101, . . . . The initial enumeration to n = 8 was made by David Klarner. This sequence has been extended by several authors, most recently by Joseph Myers [20] , who has computed the first 20 terms. The analogous sequence with squares instead of hexagons is much older (and just as hard). This is the question of counting square animals or polyominoes, the obvious generalization of dominoes. The sequence A000105 begins 1, 1, 2, 5, 12, 35, 108, 369, 1285, 4655, 17073, 63600, . . . . 24
It has been computed to 28 terms by Tom´as Oliveira e Silva [21].
25
Figure 6: Polyhexes with 1, 2 or 3 hexagons, corresponding to the benzene, naphthalene, anthracene, phenalene and phenanthrene structures, respectively.
26
Figure 7:
Polyominoes with 1 through 5 squares, illustrating the terms
a(1) = a(2) = 1, a(3) = 2, a(4) = 5, a(5) = 12 of A000105 (from Eric Weisstein’s World of Mathematics: Polyomino )
27
11. Other Chemical Sequences The above examples are just a few of the 120,000 sequences currently in the OEIS. We apologize to all those people (Balaban, Harary, Lederberg, Losanitsch, ...) whose names we did not mention. If we had had more space we would also have included the following topics. R. W. Robinson’s work on enumerating graphs using extensions of P´olya’s theory (86 entries in MathSciNet). Sven Cyvin, Børg Cyvin, Jon Brunvoll and others: A very long series of papers (e.g. [22]) on the enumeration of benzenoid hydrocarbons, catacondensed hydrocarbons, and other topologies of molecular graphs. Fullerenes: Enumerated by P. W. Fowler and D. E. Manolopoulos [23], G. Brinkmann and A. W. M. Dress [24] and others. Self-Avoiding Walks on Lattices. A major area of research in physics, with an enormous number of papers and with many sequences in the OEIS. See B. D. Hughes [25] for an overview. Dissections and Tilings. Typical questions are: how many ways are there to dissect an n-sided polygon into triangles (A000207), or how many ways are there to tile an n × n square with dominoes (A004003)? Necklaces: How many n-bead necklaces can be made using beads of two colors? Sequences A000013, A000029, A000031 give typical answers (depending on what 28
symmetries are allowed).
29
12. Conclusion
The goal of the OEIS is to include information about all interesting number sequences. At the present time over 1,000 of the 120,000 entries have their origin in chemistry, and chemists will find much material of interest. To find the OEIS, simply Google “sequences”. If you come across a number sequence that is not in the OEIS, especially from the chemical literature, please send it in to the database. There is a special web page for contributing a new sequence or a comment. The reasons for doing this are that the next person who comes across the sequence will be grateful for the reference, and your name will be preserved in the OEIS as the person who contributed the sequence. (You need not be the author of the sequence to send it in.)
30
13. Postscript: Puzzles
Another use for the OEIS is to help people do well on quizzes. Can you find the next term in the following? You know where to find the answers! (1):
61,21,82,43,3,64,24,?
(A087409)
(2):
1,3,7,12,18,26,35,45,56,69,83,98,114,131,150,?
(3):
4,6,7,9,10,11,12,14,15,16,17,18,19,20,22,23,24,25,?
(4):
1,2,4,8,16,22,26,38,62,74,102,104,108,116,122,126,?
(5):
679,378,168,48,?
(6):
2,4,6,30,32,34,36,40,42,44,46,50,52,54,56,60,62,64,66,2000,?
(A005228) (A001690) (A063108)
(A121105)
(A006933, the eban numbers). (7):
2,12,1112,3112,132112,1113122112,311311222112,?
(8):
2,3,3,5,10,13,39,43,172,177,885,?
(9):
1,2,2,3,3,4,4,4,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,9,? (A001462)
31
(A006751)
(A019460)
References [1] N. J. A. Sloane, The On-Line Encyclopedia of Integer Sequences, published electronically at www.research.att.com/∼njas/sequences/, 1996–2006. [2] A. Cayley, (a) Phil. Mag., 67 (1874), 444–447; (b) Chem. Ber., 8 (1875), 1056–1059; (c) Rep. Brit. Assoc. Adv. Sci., 45 (1875), 257–305. [3] E. M. Rains and N. J. A. Sloane, J. Integer Sequences, 2 (1999), #99.1.1. [4] C. M. Blair and H. R. Henze, J. Amer. Chem. Soc., 54 (1932), 1098–1105, 1538–1545. [5] D. D. Coffman, C. M. Blair and H. R. Henze, J. Amer. Chem. Soc., 55 (1933), 252–253. [6] H. R. Henze and C. M. Blair, J. Amer. Chem. Soc., 53 (1931), 3042–3046, 3077–3085; 55 (1933), 680–686; 56 (1934), 157. [7] G. P´olya, (a) Zeit. f. Kristall., 93 (1936), 415–443; (b) Acta Math., 68 (1937), 145–254 [English translation in [8]]. [8] G. P´olya and R. C. Read, Combinatorial Enumeration of Groups, Graphs and Chemical Compounds, Springer, 1987. [9] J. H. Redfield, Amer. J. Math., 49 (1927), 433–455. [10] N. G. de Bruijn, in Applied Combinatorial Mathematics, ed. E. F. Beckenbach, Wiley, 1964. [11] B. K. Teo and N. J. A. Sloane, (a) Inorg. Chem., 24 (1985), 4545–4558; (b) 25 (1986), 2315–2322. [12] N. J. A. Sloane and B. K. Teo, J. Chem. Phys., 83 (1985), 6520–6534.
32
[13] N. J. A. Sloane, J. Math. Phys., 28 (1987), 1653–1657. [14] G. O. Brunner and F. Laves, Wiss. Zeitschr. Techn. Univ. Dresden, 20 (1971), 387–390. [15] R. W. Grosse-Kunstleve, G. O. Brunner and N. J. A. Sloane, Acta Cryst., A52 (1996), 879–889. [16] T. J. McLarnan, Zeitschr. f. Kristall., 155 (1981), 269–291. [17] R. M. Thompson and R. T. Downs, Acta Cryst., B57 (2001), 761–771; B58 (2002), 153. [18] E. A. Lord et al., Phil. Mag., A82 (2002), 255–268. [19] E. Estevez-Rams et al., Acta Cryst., A61 (2005), 201–208. [20] Joseph Myers, Polyomino, Polyhex and Polyiamond Tiling, published electronically at www.srcf.ucam.org/∼jsm28/tiling/, 2005 (the entries in [1] are more up-to-date). [21] Tom´as Oliveira e Silva, Animal Enumerations on the {4,4} Euclidean Tiling, published electronically at http://www.ieeta.pt/ tos/animals/a44.html. [22] J. Brunvoll, S. J. Cyvin and B. N. Cyvin, J. Math. Chem., 21 (1997), 193–196. [23] P. W. Fowler and D. E. Manolopoulos, An Atlas of Fullerenes, Cambridge Univ. Press, 1995. [24] G. Brinkmann and A. W. M. Dress, J. Algorithms, 23 (1997), 345–358. [25] B. D. Hughes, Random Walks and Random Environments, Oxford, 2 vols., 1995–1996.
33