A Metagenomic Approach to the Airways Microbiome ...

2 downloads 0 Views 180KB Size Report
a reference and the Needleman alignment algorithm with a. A Metagenomic ... International University, Herbert Wertheim College of. Medicine, for DNA isolation, ...
A Metagenomic Approach to the Airways Microbiome of Chronic Obstructive Pulmonary Disease (COPD) Mitch Fernandez, Melita Jaric, Lisa Schneper, Jonathan Segal, Eugenia Silva-Herzog, Michael Campos, Joel Fishman, Mathias Salathe, Adam Wanner, Juan Infante, Kalai Mathee, Giri Narasimhan 

Abstract— Current research has shown that different sites of the human body house different bacterial communities. There is a strong correlation between an individual’s microbial community profile at a given site and the onset of disease. Chronic Obstructive Pulmonary Disease (COPD) is a progressive lung disease resulting in narrowing of the airways and restricted airflow. Despite being the third leading cause of death in the United States, little is known about the differences in the lung microbial community profiles of healthy individuals vs. COPD patients. Metagenomics is the culture-independent study of genetic material obtained directly from samples. A metagenomic analysis of 56 individuals was conducted. Bronchoalveolar lavage (BAL) samples were collected from COPD patients, active or ex-smokers, and never smokers. 454 pyrosequencing of 16S rRNA was performed and analyzed using a newly designed, modular bioinformatic workflow. Substantial colonization of the lungs was found in all subjects and differentially abundant genera in each group were identified (including Tropheryma in COPD and Sneathia in smokers). These discoveries are promising and may further our understanding of how the structure of the lung microbiome is modified as COPD progresses. It is also anticipated that the results will eventually lead to improved treatments for COPD.

I. INTRODUCTION A major obstacle in understanding the microbial makeup of a given environment is that most bacteria cannot be cultured in a laboratory and thus traditional studies fall short of accurately characterizing an environment’s microbiome [1]. In the last decade, metagenomics has emerged as a promising solution to this problem. A highly interdisciplinary approach involving a combination of ecology, microbiology, computer science, and statistics, metagenomics is the cultureindependent study of samples taken directly from an environment of interest. The DNA present in these samples is isolated, sequenced and used to reconstruct the microbial community structure of the source [14]. Although early metagenomic studies can be traced back to the late 1990s [5], interest in the approach increased appreciably after Venter’s Sargasso Sea Expedition [12]. Metagenomics has since been used in many biomedical applications. Several studies have shown that different sites of the human body house different bacterial communities and that there is a strong correlation M. Fernandez, M. Jaric and G. Narasimhan are with Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, Florida International University, Miami, FL 33140. L. Schneper, J. Segal, E. Silva-Herzog, and K. Mathee are with Department of Molecular Microbiology and Infectious Diseases, Florida International University, Miami, FL 33140. M. Campos, J. Fishman, M. Salathe, A. Wanner, and J. Infante are with Division of Pulmonary and Critical Care Medicine, Miller School of Medicine, University of Miami, FL.

between an individual’s microbial profile at a given site and disease [11]. Determining the differences between healthy and diseased individuals and identification of possible pathogens could lead to improved diagnoses and treatments. Although the broad concept of metagenomics is fairly intuitive, in practice it remains a challenging and dataintensive approach. A major limitation of metagenomics is the fragmented nature of the samples collected. There is no guarantee of obtaining a complete genome from any single organism in a sample. In addition, depending on the source from which it was collected, a sample could contain sequences from tens of thousands of individual organisms [13]. Therefore, a great deal of care and thought needs to be put into how samples are collected, processed, and analyzed. Here we describe the application of an analytical workflow to a study into Chronic Obstructive Pulmonary Disease (COPD). This is a progressive lung disease afflicting over 64 million people worldwide, and is the third leading cause of death in the United States [7]. COPD is an umbrella term for lung diseases (such as emphysema and chronic bronchitis) that are characterized by tightening of the airways and reduced airflow to the lungs. COPD affects the overall structure of the airways, typified by a loss of elasticity in the air sacs and airways, deterioration in the walls of the air sacs as a result of thickening and inflammation, overproduction of mucus that further clogs the airways, and frequent flare-ups which may provide opportunities for bacterial growth [6]. Although the long-held belief that healthy lungs are sterile, but that diseased lungs are colonized [4] has been disproved, the differences between healthy and diseased lungs is still not well understood. A study from the University of Michigan [3] found that healthy lungs are in fact colonized, but the lungs of subjects with decreased function had fewer species present. Although interesting, the more extensive study described here is hoped to better inform diagnosis and the development of future treatments. We hypothesize that healthy and diseased lungs are differentially colonized. Further, the microbial composition of a human lung can help categorize that lung as healthy or diseased. To validate these claims, the study described here compared individuals with COPD, smokers, and never smokers. II. MATERIALS AND METHODS A workflow was modeled on the Costello stool analysis performed by Schloss [9]. An oligos file was manually written for each sequencing pool to group reads by subject. Pools contained a maximum of twelve samples, each with a unique barcode. One error was allowed per barcode. Primers were not removed from sequences. Cutoffs were set at 25 for the quality score and 50 bp for minimum length. Sequences were aligned using the Silva Bacteria Alignment Database as a reference and the Needleman alignment algorithm with a

kmer size of 8. The reverse complement was used for the alignment if 50% or more of reads were removed during the original alignment. Chimera.uchime was used for chimera handling using reference = names. Classification was performed with a kmer size of 7 and a confidence cutoff of 60. Reads below 99% confidence at the kingdom level (Bacteria or Archaea) were considered contamination and removed as were all Eukaryota, mitochondria, and chloroplasts. Phylotyping was used to cluster reads based on taxonomy. Richness was estimated using the Chao method [2] and diversity using the Inverse-Simpson index [10]. Differentially abundant taxa were identified using Metastats [8] and performing comparisons between all subject groups. Steps in the workflow were automated using custom shell scripts. Data was extracted for analysis in Excel using custom Python and R scripts. All samples were collected by bronchoalveolar lavage (BAL) at the University of Miami’s Division of Pulmonary and Critical Care Medicine. Study participants were recruited on a volunteer basis and grouped into three broad categories (COPD, Smoker, Never Smoker) and further subdivided into five specific categories (COPD – Active Smoker, COPD – Former Smoker, Smoker – Active Smoker, Smoker – Former Smoker, and Never Smoker). BAL samples were taken to the lab of Dr. Kalai Mathee at Florida International University, Herbert Wertheim College of Medicine, for DNA isolation, amplification, and purification. Standard protocols for the FastDNA® Spin Kit were used to isolate DNA. Newly designed primers for amplification of v6-8 region of 16S rRNA were used and compared to HMP v3-5 primers. PCR was performed using Qiagen HotStarTaq followed by the Wizard® SV Gel and PCR Clean-Up System. 454 pyrosequencing was performed by the University of Florida’s ICBR Genomics Division using a Roche GS FLX+ Titanium XL sequencer.

Measures of richness and diversity proved inconclusive for distinguishing between membership in any group. This is consistent with the results of a smaller study (ErbDownward 2010). However, there are clear differences in the abundance of specific genera. Most notably, Tropheryma is not present in significant numbers in never smokers but is significant in the other two groups. A member of this genus is associated with Whipple’s disease, a treatable condition. The other genera found in COPD patients are also suggestive of potential avenues for future treatments. Additional analyses are being conducted which are beginning to show correlations between combinations of genera present in the various subject groups. These combinations should prove more effective for automatically classifying subjects into groups based on lung colonization. V. CONCLUSION An efficient workflow for metagenomic analysis has found significant differences in the colonization of the lung microbiome between subjects with COPD, smokers, and never smokers. Although additional work is still needed, the results obtained are promising and bode well for the development of future treatments for COPD. REFERENCES [1]

[2] [3]

[4]

[5]

III. RESULTS Table 1 lists a summary of genera that were differentially abundant to a statistically significant extent when comparing groups of subjects. Genera that did not appear in at least 20% of the members of each group were not considered. Listed are only genera that were validated by both sets of primers used. TABLE 1. DIFFERENTIALLY SIGNIFICANT GENERA BETWEEN GROUPS. Genus Modicisalibacter Phocaeicola Gemella Parvimonas Streptococcus Fusobacterium Genus unclassified (Family Propionibacteriaceae) Tropheryma Genus Sneathia Tropheryma

IV. DISCUSSION

COPD X

COPD X X Smoker X X

[6]

[7]

[8] [9]

Smoker X X X X X Never Smoker

[10] [11]

[12]

[13]

[14] Never Smoker

Amann R, Ludwig W, Schleifer K. 1995. Phylogenetic identification and in-situ detection of individual microbial-cells without cultivation. Microbiological Reviews 59 : 143-169 Chao A. 1984. Nonparametric-estimation of the number of classes in a population. Scandinavian Journal of Statistics 11 : 265-270 Erb-Downward JR, Thompson DL, Han MK, Freeman CM, McCloskey L, et al. 2011. Analysis of the lung microbiome in the "healthy" smoker and in COPD. Plos One 6 : e16384 Hankinson J, Odencrantz J, Fedan K. 1999. Spirometric reference values from a sample of the general US population. American Journal of Respiratory and Critical Care Medicine 159 : 179-187 Hugenholtz P, Goebel B, Pace N. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity (vol 180, pg 4765, 1998). J Bacteriol. 1998 DEC;180(24):6793-. Martinez F, Foster G, Curtis J, Criner G, Weinmann G, et al. 2006. Predictors of mortality in patients with emphysema and severe airflow obstruction. American Journal of Respiratory and Critical Care Medicine. 173 : 1326-1334 Miniño A, Heron M, Smith B, Kochanek K. 2011. Deaths: final data for 2008: National Vital Statistics Reports. Hyattsville, Md: National Center for Health Statistics. Paulson JN, Pop M, Bravo HC. Metastats: An improved statistical method for analysis of metagenomic data. Genome Biol. 2011;12:12-. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: Open-source, platform-independent, communitysupported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009 DEC 1;75(23):7537-41. Simpson EH. 1949. Measurement of diversity. Nature 163 ; 688-688 Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007 OCT 18;449(7164):80410. Venter J, Remington K, Heidelberg J, Halpern A, Rusch D, Eisen J, et al. Environmental genome shotgun sequencing of the sargasso sea. Science. 2004 APR 2;304(5667):66-74. Warnecke F, Hugenholtz P. Building on basic metagenomics with complementary technologies. Genome Biol. 2007;8(12):231. Wooley JC, Godzik A, Friedberg I. A primer on metagenomics. Plos Computational Biology. 2010 FEB;6(2):e1000667.

Suggest Documents