IJC International Journal of Cancer
The Swedish Family-Cancer Database 2009: prospects for histology-specific and immigrant studies Kari Hemminki1,2,3, Jianguang Ji2, Andreas Brandt1, Seyed Mohsen Mousavi1 and Jan Sundquist2,4 1
Division of Molecular Genetic Epidemiology, German Cancer Research Centre (DKFZ), Heidelberg, Germany Center for Primary Health Care Research, Lund University, Malmo¨, Sweden 3 Center for Family and Community Medicine, Karolinska Institute, 14183 Huddinge, Sweden 4 Stanford Prevention Research Center, Stanford University School of Medicine, Stanford, CA
The Swedish Family-Cancer Database comprises a total of 11.8 million individuals covering the Swedish population of the past 100 years. Version VIII of the Database is described in the present article. Cancer cases were retrieved from the Swedish Cancer Registry for the period 1958–2006, including more than 1 million first primary cancers. The number of familial cancers in offspring is 14,000 when a parent was diagnosed with a concordant (same) cancer and the number of concordant siblings was 6,000. From the year 1993 onwards histopathological data according to the SNOMED classification were used, which entails advantages for certain cancers, such as breast cancer. Even though the specific morphological classification only covers a limited number of years, it does cover most familial cancers in the offspring generation. The Database records the country of birth for each subject. A total of 1.79 million individuals were foreign born, Finns and other Scandinavians being the largest immigrant groups. The cancer incidence in the first-generation immigrants was compared to that in native Swedes using standardised incidence ratios (SIRs) to measure relative risk. The SIRs ranged widely between the immigrant groups, from 1.9-fold for myeloma to 25-fold for melanoma. The differences in SIRs were smaller in the second-generation immigrants. The usefulness and the possible applications of the Family-Cancer Database have increased with increasing numbers of cases, and the numerous applications have been described in some 300 publications. Familial cancer studies are in the stimulating interphase of the flourishing disciplines of genetics and epidemiology.
Studies on familial cancer clustering have been fundamental for the understanding of heritable determinants of cancer and for discovering cancer-related genes.1 Improved understanding of cancer genetics has been helpful in clinical genetic counselling, targeted screening activities and genetic testing.2,3 In the clinical setting, familial cancer clustering has been studied through identification of probands and multiply affected individuals in their multigeneration pedigrees. Many forms of cancer in which a single gene poses a high risk have been identified, including at least 440 single-gene traits in which cancer is a complication.4–6 Case ascertainment in multigeneration families and availability of biological specimens are usually feasible in the clinical settings. The disadvantages include difficulties in obtaining large numbers of cases and in securing unbiased risk estimates. It has been Key words: familial cancers, histology, heritable cancer, immigrants, genes Grant sponsors: Deutsche Krebshilfe, Swedish Cancer Society, Swedish Council for Working Life and Social Research DOI: 10.1002/ijc.24795 History: Received 19 May 2009; Accepted 17 Jul 2009; Online 29 Jul 2009 Correspondence to: Kari Hemminki, Division of Molecular Genetic Epidemiology, German Cancer Research Centre (DKFZ), 69120 Heidelberg, Germany, E-mail:
[email protected]
C 2009 UICC Int. J. Cancer: 126, 2259–2267 (2010) V
estimated that clinical observation probably works for dominant diseases for which the relative risks are over 10. However, for recessive conditions, clinical observation is less sensitive, and most results for recessive conditions have come from isolated populations with high rates of consanguineous marriage. Another approach for studying familial cancer has been to analyse cancer risks of the relatives of the index cases in analytical epidemiological studies. Although the number of cases may be large in such studies, the reliability of the information, usually based on recall, tends to be less certain. However, when registered data on family relationships are available and when they can be coupled to cancer register data, an unbiased retrieval of familial cases is possible, as has been shown for the Swedish family data by Leu, Reilly and Czene.7,8 Such population-based datasets have been constructed in some countries and geographical areas, including the Utah Mormon population database and the Icelandic genealogical database.9–11 Denmark, Norway, Finland and Sweden can establish family relationships from multigenerational registers, and these have been used in cancer and other disease studies.12–15 The Swedish Family-Cancer Database was first assembled from the national databases in 1996 and since then it has been updated periodically.16 It has been used by us and others, who have used various aliases for the assembled databases, in probably some 300 cancer studies. These
Forum
2
Forum
2260
publications have been the main global source of unbiased data on familial risks. The Database is the largest in the world for familial cancer and here we described the year 2009 update, which is version VIII in sequence. In addition to the present contents of the Database, we will show some unique potentials that the current data offer in terms of histological specification and immigrant data. The focus in the present article is on the structure and applications of a population-based family register. Space does not allow a survey of other types of family or pedigree datasets which are usually based in a clinical setting and which have been the major approach to identifying Mendelian diseases. The advantages in such clinic-based studies are diagnostic accuracy and access to biological material which enables genetic studies. With the exception of the Icelandic population records used by DeCode, the population-based family databases have contributed only indirectly to genetic studies on human diseases. The ethical framework of many databases has hampered, if not prevented, contact with the registered individuals or access to biological material. Thus, the databases have allowed excellent studies on the genetic epidemiology of diseases but have not allowed access to samples. The success of DeCode in disease genetics demonstrates the lost opportunities for the other population databases and may call into question the ethical motivations of limiting the scientific use of population data.
Structure of the Database version VIII Statistics Sweden created a family database, the ‘Second Generation Register’ in 1995, which was later renamed the ‘Multigeneration Register’ because it contained more than 2 generations. We have linked this register to the Swedish Cancer Registry (started in 1958) to create the Family-Cancer Database, in which the families are composed of offspring (second generation) born in 1932 and later with their parents. The personal identifiers have been changed to unique numbers by Statistics Sweden, whereby no individual can be identified in the Database. However, Statistics Sweden maintains the codes, and on special focused requests it has been possible to identify individuals, for example, to collect data within the health care system; no contacts to the individuals have been allowed. Some of the previous updatings of the Database have been reported and these give more details on types and organization of the data.16–19 The Family-Cancer Database is supplied with longitudinal demographic and socio-economic data from each national census for 1960, 1970, 1980 and 1990; causes of deaths are obtained from the Causes of Death Register. In spite of the aging offspring generation (