Journal of Fish Biology (2010) 77, 2093–2122 doi:10.1111/j.1095-8649.2010.02821.x, available online at wileyonlinelibrary.com
Comparing the performance of multiple mitochondrial genes in the analysis of Australian freshwater fishes T. J. Page* and J. M. Hughes Australian Rivers Institute, Griffith University, Nathan, Queensland 4111, Australia (Received 23 April 2010, Accepted 25 September 2010) In this study, four mitochondrial genes (cytochrome oxidase I, ATPase, cytochrome b and control region) were amplified from most of the fish species found in the fresh waters of south-eastern Queensland, Australia. The performance of these different gene regions was compared in terms of their ability to cluster fish families together in a neighbour-joining tree, both individually by gene and in all combinations. The relative divergence rates of each of these genes were also calculated. The three coding genes (cytochrome oxidase I, ATPase and cytochrome b) recovered similar number of families and had broadly similar divergence rates. ATPase diverged a little more quickly than cytochrome oxidase I and cytochrome b slightly more slowly than cytochrome oxidase I. All twogene combinations recovered the same number of families. Results from the control region were much more variable, and, although generally possessing more diversity than the other regions, were © 2010 The Authors sometimes less variable. Journal of Fish Biology © 2010 The Fisheries Society of the British Isles
Key words: ATP; COI; control region; cytochrome b; DNA barcoding; Queensland.
INTRODUCTION There is a long tradition of using genetic information in fisheries science and management dating back to the 1950s, not only in the investigation of stock structures (Kochzius, 2009) but even in the detection of cheating in fishing competitions (Primmer et al., 2000). The use of molecular data has continued unabated as it has become cheaper and highly automated, and these data are now widely used as a part of research in fish ecology, systematics and conservation (Hauser & Seeb, 2008). Although long employed in this role (Hamilton & Wheeler, 2008), a great deal of attention has recently been focused on using molecular data for the identification of fish species (Ward et al., 2009). While this may not seem a major issue for most adult specimens, the accurate identification of larvae, eggs, fillets, fin clips and unfamiliar exotic invasive species (Mather & Arthington, 1991) can indeed be challenging, even for an experienced ichthyologist (Kochzius, 2009; Teletchea, 2009). Yet, the ability to assign an individual to a species is vital for ecological research (Turner, 1999), *Author to whom correspondence should be addressed. Tel.: +61 7 37357418; email:
[email protected]
2093 © 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles
2094
T. J . PA G E A N D J . M . H U G H E S
biosecurity (Ferri et al., 2009) and conservation, given that environmental protection is rarely conferred upon taxonomic units other than species (Turner, 1999). Therefore, if a study was planned to document the fish biodiversity of a geographic area, what kind of molecular data should be used, given that genetic methods, like fishes themselves, are highly diverse (Hauser & Seeb, 2008)? Microsatellites have been widely used in the definition of stock structures, but usually need to be developed separately for each species and so are not widely transferable between all fish species in an area (Kochzius, 2009). On the other hand, mitochondrial gene sequencing has proven very popular as these sequences are easily obtained across many species (Galtier et al., 2009) and possess enough information to differentiate between species and divergent populations within a species (evolutionarily significant units, ESU) (Vrijenhoek, 1998). There are numerous mitochondrial (mt) genes and regions, so which should be chosen? Cytochrome b (cytb) has traditionally been the most commonly used mitochondrial gene in fish studies, particularly for phylogenetics and phylogeography (Meyer, 1994; Teletchea, 2009), and thus it (and the nuclear gene rhodopsin) is the target region for the large-scale European fish identification project FishTrace (www.fishtrace.org) (Sevilla et al., 2007). More recently, the 5 -end of the cytochrome c oxidase subunit 1 (COI or COX1) has come to the fore (Galtier et al., 2009), riding the wave of enthusiasm for ‘DNA barcoding’ (Hebert et al., 2003; Ward et al., 2009). The basic tenet of barcoding is the selection of a single gene fragment to identify all described animal species and to aid in the discovery of new species (Hebert et al., 2003). There has been much debate, some of it rancorous, over the strengths and weaknesses of barcoding in particular and of mitochondrial genes in general (Rubinoff & Holland, 2005; Frezal & Leblois, 2008; Galtier et al., 2009). Despite this, many regional fish barcoding projects are progressing, producing lots of potentially useful COI data (Ward et al., 2005; Hubert et al., 2008; Ariagna et al., 2010), especially given that fishes are an integral part of the Barcoding for Life Data Systems (BOLD; www.boldsystems.org) Ratnasingham & Hebert, 2007) in the Fish Barcode of Life Initiative (FISH-BOL; www.fish-bol.org; Ward et al., 2009). In addition to the above two mitochondrial genes, there are numerous others. The question remains, which one to use? Meyer (1994) suggested keeping an open mind in considering potential markers. The choice may well be influenced by the extent of existing data for the taxa of interest, as online databases (e.g. GenBank, BOLD) hold a great number of publicly available DNA sequences. There is no point in reinventing the wheel if someone has already sequenced the same species from the same area. Existing and new sequence data, however, can only be integrated if the same gene fragment has been used. The large number of fish phylogeography and phylogeny projects over the years has resulted in a great deal of potentially usable data (Hauser & Seeb, 2008). In the case of rare and endangered species, it may not be possible to obtain permission to resample and resequence many more individuals, and hence a researcher may be forced to use whatever gene was used in previous studies. For example, if the IUCN red listed Oxleyan pygmy perch Nannoperca oxleyana Whitley was of interest, the mitochondrial control region would need to be sequenced, as Knight et al. (2009) did so as to align with existing data (Hughes et al., 1999), whether or not the control region was judged to be the best fragment for this purpose. Alternatively, if interest was in some Australian eleotrid species, gene choice may need to be tailored based on the species of interest, e.g. for Mogurnda spp., © 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
C O M PA R I N G F R E S H WAT E R F I S H M I T O C H O N D R I A L G E N E S
2095
either ATPase (Hurwood & Hughes, 1998) or NADH4 (M. Adams, pers. comm.); for most Hypseleotris species, cytb (Thacker et al., 2007) but for the empire gudgeon Hypseleotris compressa (Krefft), ATPase (McGlashan & Hughes, 2001). This sort of confusion might argue for the use a single mitochondrial fragment for fishes, as suggested by DNA barcoding (Ward et al., 2009). Nevertheless, the question remains as to whether it really matters, as all mitochondrial genes share the same history (Ballard & Whitlock, 2004). Would the various fragments do the same jobs equally well? Some fragments, such as the control region and ATPase, are thought to diverge more quickly than COI or cytb (Meyer, 1994). Might some genes be better for deeper relationships than others (Miya & Nishida, 2000)? Could the divergence between species from one gene be used to predict the possible divergence for another gene from a different species, thus making the comparison of published data sets from sympatric species more relevant? In an attempt to answer some of these questions, this study considered the genetic diversity of the freshwater fish fauna of south-eastern Queensland, Australia (Fig. 1) by comparing the performance of a number of different mitochondrial genes in terms of phylogenetic information content and relative divergence. A better understanding of the differing rates of divergence levels of various markers will mean that studies on different species and genes can more easily be integrated for a total evidence approach. There have been fewer barcoding studies on freshwater fishes than on marine species (Hubert et al., 2008; Ward et al., 2009), and yet freshwater fishes show a higher level of differentiation between populations because of the constraints of their landscape (Ward et al., 1994). This isolation can eventually lead to speciation, and thus the potential for unappreciated levels of biodiversity in the form of cryptic species is high for freshwater fishes (Ward et al., 2009), as has been proven in the case in Australia (Page et al., 2004; Hammer et al., 2007). Freshwater fishes are currently under threat from large-scale dam projects, invasive species, pollution and human population growth (Vrijenhoek, 1998; Leveque et al., 2008), all of which are very evident in south-eastern Queensland (Page et al., 2004; Olden et al., 2008), and thus a thorough knowledge of the region’s existing biodiversity in the form of a genetic database is important (Ferri et al., 2009; Ward et al., 2009).
MATERIALS AND METHODS O N L I N E D ATA B A S E S E A R C H E S Online databases were searched to aid in the selection of appropriate mitochondrial gene regions to be targeted. ISI Web of Knowledge Current Contents Connect (www. isiwebofknowledge.com) was searched for Journal of Fish Biology papers from 2000 to 2010 inclusive on 19 March 2010 with the topic search terms ‘fish’ and ‘phylogeograph*’or ‘fish’ and ‘phylogen*’. Only papers including mitochondrial sequence data were retained [e.g. no nuclear-only data and restriction Fragment Length Polymorphism (RFLP)]. The European Molecular Biology Laboratory (EMBL) online sequence database was searched on 24 March 2010 using the SRS query facility (http://srs.ebi.ac.uk) by selecting all nucleotide sequence databases and sub-sections (which includes GenBank and the DNA Data Bank of Japan) for ray-finned fishes (search term ‘Actinopterygii’) and with the following four separate groups of search terms (| = or): (1) ‘COI | CO1 | COX1 | cytochrome oxidase 1 | cytochrome oxidase I | cytochrome c oxidase subunit I | cytochrome c oxidase subunit 1’, (2) ‘CytB | cytochrome B’, (3) ‘ATP 6 | ATP6 | ATPase 6 | ATPase subunit 6 | © 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
2096
T. J . PA G E A N D J . M . H U G H E S
South-east Queensland 25°
Tin Can Bay 26°
Mary Noosa
Maroochy Mooloolah Glasshouse Mountains 27° Bribie Island
Caboolture Brisbane
Moreton Island
Pine
Stradbroke Island LoganAlbert 0
Gold Coast
28°
50 km
152°
153°
Fig. 1. Map of south-east Queensland, Australia, showing boundaries of river basins.
ATP synthase 6’ (and the same searches with ‘8’ in place of ‘6’) and (4) ‘D-loop | control region’. The Barcoding of Life (BOLD) online database (Ratnasingham & Hebert, 2007) was searched on the same day for all COI records for Actinopterygii.
S A M P L I N G S T R AT E G Y The aim was to include sequences from as many species of fishes as possible, likely to be encountered in the fresh waters of south-eastern Queensland, Australia (Fig. 1), either from specimens sampled for this project or from the many fish mitochondrial genomes now freely available (e.g. http://mitofish.ori.u-tokyo.ac.jp; Table I). This would allow direct comparisons between gene regions from the same individual specimens. Included were 22 of the 26 freshwater species native to south-eastern Queensland (Pusey et al., 2004), nine amphidromous or catadromous species native to the area that are often sampled in fresh waters (Pusey et al., 2004; FishBase: www.fishbase.org), one Australian species translocated into the area and eight non-Australian exotic species introduced into the area (Pusey et al., 2004). Fishes were captured with a seine, dip-net or baited box trap, and identified in the field for fin clipped individuals and in the laboratory for whole fishes, and were preserved when necessary in 95% ethanol or liquid nitrogen. Multiple representatives of the Australian smelt Retropinna semoni (Weber) (Hammer et al., 2007) and ornate rainbowfish Rhadinocentrus ornatus Regan (Page et al., 2004) were included, as these are likely to harbour cryptic species. Species not found © 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
Ceratodontidae Neoceratodus forsteri *
Hypoatherina tsurugae Centropomidae Lates calcarifer*
Craterocephalus stercusmuscarum* Craterocephalus stramineus†
Atherinidae Craterocephalus marjorae*
Anguilla reinhardtii *
Anguilla australis*
Anguillidae Anguilla anguilla
Taxon
GenBank – Queensland, Australia
GenBank – Singapore Aquaculture
Upper Canungra Creek, Gold Coast, Queensland, Australia Caboolture Creek, Queensland, Australia Nicholson River at Adel’s Grove, Queensland, Australia GenBank – north-west Pacific
GenBank – Burnett River, Queensland, Australia Brisbane River at Fernvale, Queensland, Australia
GenBank, France
Collection site
HM006997
HM006955
AF302933
DQ010541
AF302933
DQ010541
AP004420
HM006996
HM006954
AP004420
HM007034
AP007234
AP007233
cytb
AP007234
AP007233
CR Minegishi et al. (2005) Minegishi et al. (2005) This study
Reference
AF302933
DQ010541
AP004420
HM007037
HM007036
AF302933
DQ010541
AP004420
Brinkmann et al. (2004)
Lin et al. (2006)
Miya et al. (2003)
HM006921 This study
HM006920 This study
HM0069952 HM0070356 HM006919 This study
HM006994
HM0069521
HM006953
AP007234
AP007233
ATP
AP007234
AP007233
COI
Table I. Fish specimens, locations and sequence information
N/A
N/A
N/A
GU-KR140
GU-KR063
GU-AW452
GU-AW325
N/A
N/A
Specimen number
C O M PA R I N G F R E S H WAT E R F I S H M I T O C H O N D R I A L G E N E S
2097
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
Paratilapia polleni
Oreochromis ‘sp. TP’‡C
Oreochromis mossambicus‡
Hypselecara temporalis Oreochromis ‘sp. KM’B
Cichlidae Geophagus sp.‡
Ambassis ‘sp. North-west’†A
Ambassis marianus*
Chandidae Ambassis agassizi *
Taxon
Lake Samsonvale (Pine), Queensland, Australia. (introduction) Brisbane River, Queensland, Australia. (introduction) GenBank – origin Madagascar
Blackrock Creek (Brisbane River), Queensland, Australia (introduction) GenBank – origin South America GenBank – origin Africa
Keyhole Lagoon, Stradbroke Island, Queensland, Australia Albert River, Queensland, Australia Cooper Creek at Merken Waterhole, Queensland, Australia
Collection site
HM006993
HM006951
AP009508
HM006976
AP009508
HM007017
AP009126
AP009126
HM006977
AP009506
AP009506
HM006999
HM006992
HM0069501
HM006957
HM0069912
ATP
HM0069491
COI
Table I. Continued
AP009508
HM006939
HM0070586
AP009508
HM0069408
AP009126
AP009506
HM006923
HM007059
AP009126
AP009506
HM007039
Azuma et al. (2008)
This study
Azuma et al. (2008) Mabuchi et al. (2007) This study
This study
This study
HM0070336
This study
Reference
This study HM006918
HM006917
CR
HM007032
HM007031
cytb
N/A
GU-AW421
GU-F.21
N/A
N/A
GU-AW303
GU-2269
GU-2330
GU-2252
Specimen number
2098 T. J . PA G E A N D J . M . H U G H E S
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
GenBank – north-east Asia
GenBank – origin Eurasia GenBank – Eurasia GenBank – Asia
Carassius cuvieri
Cyprinus carpio‡ Xenocypris argentea Xenocypris davidi
GenBank – origin Asia
Barambah Creek (Burnett River), Queensland, Australia
Nematalosa erebi *
Cyprinidae Carassius auratus‡
GenBank – western Pacific
GenBank – north-eastern North America GenBank – North and Central America GenBank – West Africa
Collection site
Nematalosa japonica
Ethmalosa fimbriata
Clupeidae Dorosoma cepedianum Dorosoma petenense
Taxon
X61010 AP009059 NC013072
AB045144
AB111951
HM006973
AP009142
X61010 AP009059 NC013072
AB045144
AB111951
HM007014
AP009142
AP009138
AP009136
AP009136 AP009138
DQ536426
ATP
DQ536426
COI
Table I. Continued
X61010 AP009059 NC013072
AB045144
AB111951
HM007055
AP009142
AP009138
AP009136
DQ536426
cytb
X61010 AP009059 NC013072
AB045144
AB111951
AP009142
AP009138
AP009136
CR
M. Murakami, Y. Takase & H. Fujtani (unpubl. data) M. Murakami (unpubl. data) Chang et al. (1994) Saitoh et al. (2006) S. Liu, C. You & Y. Chen (unpubl. data)
Broughton & Reneau (2006) Lavoue et al. (2007) Lavoue et al. (2007) Lavoue et al. (2007) This study
Reference
N/A N/A N/A
N/A
N/A
GU-AW304
N/A
N/A
N/A
N/A
Specimen number
C O M PA R I N G F R E S H WAT E R F I S H M I T O C H O N D R I A L G E N E S
2099
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
Mogurnda mogurnda†
Hypseleotris klunzingeri * Hypseleotris ‘sp. Midgley’s’*D Mogurnda adspersa*
Hypseleotris galii *
Hypseleotris compressa*
Gobiomorphus australis*
Eleotridae Eleotris acanthopoma† Gobiomorphus coxii *
Taxon
HM0070004 HM0070025
HM007003
HM0069501 HM0069601 HM0069611
HM007010
HM0070095
HM0069681
HM006969
HM007045
HM0070045
HM0069631
HM007051
HM007050
HM007044
HM0069621
HM007043
HM007042
HM007040
HM007041
HM0070013
HM006959
Allyn River (Hunter River), New South Wales, Australia Blue Lake Creek, Stradbroke Island, Queensland, Australia Alligator Creek, Fraser Island, Queensland, Australia Blue Lake Creek, Stradbroke Island, Queensland, Australia Tinana Creek, Mary, Queensland, Australia Manilla River, Namoi, New South Wales, Australia 18 Mile Swamp, Stradbroke Island, Queensland, Australia Copperfield Creek (Daly River), North Territory Australia
cytb AP004455
ATP AP004455
AP004455
COI
GenBank – western Pacific
Collection site
Table I. Continued
HM006934
HM006933
HM006929
HM006928
HM006927
HM006926
HM006924
HM006925
AP004455
CR
This study
This study
This study
This study
This study
This study
This study
Miya et al. (2003) This study
Reference
GU-Mog94
GU-2095
GU-2110B
GU-2068
GU-2033
GU-2050
GU-2209
GU-F.01
N/A
Specimen number
2100 T. J . PA G E A N D J . M . H U G H E S
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
Galaxiella nigrostriata†
Galaxiidae Galaxias maculatus*
Philypnodon macrostomus*
Philypnodon grandiceps*
Taxon
GenBank – Australasia, South America GenBank – south-western Australia
Coomera River, Gold Coast, Queensland, Australia Stoney Creek (Brisbane River), Queensland, Australia
Collection site
AP006853
AP004104 AP006853
AP004104 AP006853
AP004104
HM007061
HM0070193
HM006979
HM007060
HM0070182
HM0069781
cytb
ATP
COI
Table I. Continued
AP006853
AP004104
HM006941
CR
Ishiguro et al. (2003) M. Miya, T. P. Satoh, Y. Yamanoue, K. Mabuchi, S. M. Shirai, N. Yagashita, K. Nakayama, H. Takeshima, N. J. Suzuki, J. G. Inoue, N. B. Ishiguro, Y. Azuma, A. Kawaguchi, T. Mukai, H. Sakurai, H. Endo & M. Nishida (unpubl. data)
This study
This study
Reference
N/A
N/A
GU-KR144
GU-2427
Specimen number C O M PA R I N G F R E S H WAT E R F I S H M I T O C H O N D R I A L G E N E S
2101
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
Mugil cephalus*
Mugilidae Mugil cephalus*
R. ornatus ‘SER’*E
Rhadinocentrus ornatus ‘CEQ’*E R. ornatus ‘SEQ’*E
Megalopidae Megalops atlanticus Megalops cyprinoides* Melanotaeniidae Melanotaenia duboulayi * Melanotaenia lacustris Melanotaenia fluviatilis† Melanotaenia splendida†
Taxon
This study This study
HM0070265 HM007068
HM0069871 HM0070272 HM007069
North Pine River, Queensland, Australia GenBank – Indo-Pacific
HM006932 This study
HM0070084 HM007049
HM006967
NC003170
NC003170
NC003170
HM007052
HM0070285 HM0070706
HM0069701 HM007011
HM006988
HM006986
HM006931 This study
HM007048
HM007007
HM006966
AP004419
AP004419
NC003170
AP004419
GU-AW418
GU-2194
GU-NR3
GU-2285
GU-KR026
GU-CSIRO2
N/A
GU-R.087
N/A N/A
Specimen number
Miya et al. (2001)§ N/A
This study
This study
Miya et al. (2003)
HM006930 This study
AP004419
Inoue et al. (2004) Inoue et al. (2004)
Reference
HM0070062 HM007047
AB051110
CR
HM006965
AP004808 AB051110
cytb
Coondoo Creek, Mary, Queensland, Australia GenBank – Papua New Guinea Mildura Weir Pool, Murray, Victoria, Australia Catfish Creek (Calliope River), Queensland, Australia Rocky Creek, Fraser Island, Queensland, Australia Little Canalpin Creek, Stradbroke Island, Queensland, Australia Searys Creek, Tin Can Bay, Queensland, Australia
AP004808 AB051110
ATP
AP004808 AB051110
COI
GenBank – Atlantic GenBank – Indo-Pacific
Collection site
Table I. Continued
2102 T. J . PA G E A N D J . M . H U G H E S
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
Tandanus tandanus*
Porochilus rendalhi *
Plotosidae Neosilurus hyrtlii *
Percichthyidae Nannoperca australis† Nannoperca oxleyana*
Scleropages leichardti
Osteoglossum bicirrhosum Scleropages formosus
Osteoglossidae Arapaima gigas
Taxon
Obi Obi Creek, Mary, Queensland, Australia Keyhole Lagoon, Stradbroke Island, Queensland, Australia Blue Lake Creek, Stradbroke Island, Queensland, Australia
Goulburn River, Victoria, Australia 18 Mile Swamp, Stradbroke Island, Queensland, Australia
GenBank – origin South America GenBank – Singapore fish farm Lake Atkinson (Brisbane River), Queensland, Australia. (introduction)
GenBank – Brazil
Collection site
DQ023143
AB043025
EF523611
CR
HM007072 HM006948
HM0070204 HM0070303
HM0069801
HM006990
HM007062 HM006942
HM007015
HM007056 HM006937
HM007054 HM006936
HM007053 HM006935
HM007071
DQ023143
AB043025
EF523611
cytb
HM006974
HM007013
HM006972
HM0070292
HM0069891
HM007012
DQ023143
DQ023143
HM006971
AB043025
EF523611
ATP
AB043025
EF523611
COI
Table I. Continued Specimen number
This study
This study
This study
This study
This study
This study
Yue et al. (2006)
GU-2114
GU-2120
GU-AW562
GU-2145
GU-PP51
GU-AW401
N/A
Hrbek & Farias N/A (2008) Inoue et al. (2001) N/A
Reference
C O M PA R I N G F R E S H WAT E R F I S H M I T O C H O N D R I A L G E N E S
2103
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
Retropinna semoni ‘SEQ’*F
Retropinna semoni ‘SEC’†F Retropinna semoni ‘SEQ’*F
Pseudomugil signifer* Retropinnidae Retropinna retropinna
Xiphophorus maculatus‡ Pseudomugilidae Pseudomugil mellis*
Xiphophorus hellerii ‡
Gambusia holbrooki ‡
Poeciliidae Gambusia affinis
Taxon
Macleay River, New South Wales, Australia Stoney Creek (Brisbane River), Queensland, Australia Barambah Creek (Burnett River), Queensland, Australia
GenBank – New Zealand
Noosa River, Queensland, Australia Tweed River, New South Wales, Australia
GenBank – origin North and Central America Apple Creek (Burrum River), Queensland, Australia (introduction) GenBank – origin North and Central America GenBank – origin North and Central America
Collection site AP004422
cytb
Setiamarga et al. (2008)
Bai et al. (2009)
AP004108
HM007025
HM007024
HM006984
GU-AW281
HM0070667 HM006946 This study
GU-KR145
N/A
GU-KR001
Ishiguro et al. (2003) HM006945 This study
AP004108
GU-2111
GU-2340
N/A
N/A
GU-2312
N/A
Specimen number
HM0070677 HM006947 This study
HM0070233 HM0070657
AP004108
HM006985
HM006983
AP004108
HM006944 This study
AP005982
NC013089
HM0069821 HM0070225 HM007064
AP005982
NC013089
Miya et al. (2003)
Reference
HM006922 This study
AP004422
CR
HM006943 This study
AP005982
NC013089
HM0069984 HM007038
AP004422
ATP
HM0069811 HM0070214 HM007063
AP005982
NC013089
HM006956
AP004422
COI
Table I. Continued
2104 T. J . PA G E A N D J . M . H U G H E S
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
Barambah Creek (Burnett River), Queensland, Australia GenBank – western Pacific
North Maroochy River, Queensland, Australia
Collection site
AP011064
HM006964
HM006975
COI
AP011064
HM007005
HM007016
ATP
AP011064
HM007046
HM007057
cytb
AP011064
HM0069388
CR
Yagishita et al. (2009)
This study
This study
Reference
N/A
GU-AW302
GU-AW263
Specimen number
A, after Allen et al. (2002); B, after Mabuchi et al. (2007); C, informal name. Possible hybrid, see Mather & Arthington (1991); D, after Allen et al. (2002); E, possible cryptic species, see Page et al. (2004); F, possible cryptic species, see Hammer et al. (2007); SEQ, south-east Queensland. Cytochrome c oxidase subunit 1 (COI) primer combinations for this study’s sequences: 1, FishF1–FishR2; otherwise FishF2–FishR1. ATPase (ATP) primer combinations for this study’s sequences: 2, ATP82L8331–COIII2H9236; 3, Lys.31F–CO3.62R; 4, Lys.22F–CO3.62R; 5, ATP82L8331– HCH; otherwise Lys.22F–CO3.23R. Cytochrome b (cytb) primer combinations for this study’s sequences: 6, HYPSLA–RF.Thr48; 7, HYPSLA–Ret.Thr31; otherwise HYPSLA–HYPSHD. Control region (CR) primer combinations for this study’s sequences: 8, Pro-L–CRMT16498H; otherwise CRL19–CRMT16498H. *Species native to south-east Queensland. †Native to Australia but not in south-east Queensland. ‡Exotic species introduced to south-east Queensland. §Specimen originally identified as Crenimugil crenilabis and revised to Mugil cephalus in Setiamarga et al. (2008). Native to Australia and translocated to south-east Queensland; otherwise exotic and not in south-east Queensland (Pusey et al., 2004).
Rhynchopelates oxyrhynchus
Terapontidae Leiopotherapon unicolor*
Scorpaenidae Notesthes robusta*
Taxon
Table I. Continued C O M PA R I N G F R E S H WAT E R F I S H M I T O C H O N D R I A L G E N E S
2105
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
2106
T. J . PA G E A N D J . M . H U G H E S
in south-eastern Queensland, but hailing from the same genera or families as those that are, were also included for the sequence divergence analyses, making a total of 74 specimens with sequences (42 from this project and 32 downloaded mitochondrial genomes). MTD NA
TECHNIQUES
Genomic DNA was extracted using a modified version of a standard cetyltrimethylammonium bromide (CTAB)-phenol–chloroform extraction (Doyle & Doyle, 1987). On the basis of the online searches, four mitochondrial gene regions were selected: (1) cytochrome c oxidase subunit 1 (COI), because of its adoption as the DNA barcode fragment (Ward et al., 2009) and the taxonomic breadth of sequences available, (2) cytb because of the large number of fish sequences available and fish papers that have used it, (3) ATPase subunit 6 (ATP), because, although not as widely used worldwide, it has been extensively used for Australian freshwater fish studies (McGlashan & Hughes, 2001; Wong et al., 2004) and (4) control region (CR; also known as D-loop), for similar reasons as cytb. Only sequence fragments of the three coding regions >500 base pairs (bp) were targeted (Ratnasingham & Hebert, 2007). All primer sequences are presented in Table II and the primer combinations which were used for each species are given in Table I. Polymerase chain reaction (PCR) cycling conditions were 3 min at 94◦ C, followed by 40 cycles of 30 s at 94◦ C, 30 s at 48◦ C (50◦ C for CR primers and HYPSLA–HYPSHD; 52◦ C for COI combinations), 45 s at 72◦ C (60 s for COI; 90 s for all cytb and ATP combinations except HYPSLA–HYPSHD and any combination using ATP82L8331) and then a final extension of 7 min at 72◦ C. PCR products were purified using exonuclease I and shrimp alkaline phosphatase. Sequencing reactions were done with BigDye v.3.1 Terminator mix (Applied Biosystems Inc.; www.appliedbiosystems.com) and the relevant forward primer and were cleaned up with ethanol precipitation as per the manufacturer’s instructions. Sequences were produced on an Applied Biosystems 3130xl Genetic Analyser at the DNA Sequencing Facility at Griffith University. Sequences were edited and aligned using Sequencher 4.1.2 (Gene Codes Corp.; www.genecodes.com).
T R E E A N A LY S E S Because of the very different natures of the protein-coding fragments (COI, cytb and ATP), which are easily amplified and aligned, and the non-coding CR, COI-cytb-ATP were all analysed together, and CR used separately only for the divergence analyses. Using the general methods of DNA barcoding (Ward, 2009), neighbour-joining trees were assembled in MEGA version 4 (Tamura et al., 2007) using the Kimura two parameter model (K2P) and bootstrapped 1000 times for all specimens that produced sequences for all three protein-coding genes. Bootstraps are only displayed at the family level and below because this simple treebuilding methodology is not phylogenetic but rather a phenetic clustering technique (Hamilton & Wheeler, 2008), and thus sequence saturation at deep systematic levels is highly probable (Hajibabaei et al., 2007). Although simplistic, a neighbour-joining and bootstrapping combination has proven effective (Munch et al., 2008; Ross et al., 2008), at least for species-level assignments. DNA barcoding analyses on COI of Australian fishes have shown that species cluster ‘invariably’ within a genus and ‘generally’ within a family (Ward et al., 2005). D I V E R G E N C E A N A LY S E S Sequence divergences were calculated using K2P distances in MEGA4 as used by Ward (2009) and BOLD. A Mantel’s test was performed in Primer version 5.2.8 (Primer-E Ltd; www.primer-e.com) to test the correlation of each of the three coding gene distance matrices with each other (1000 permutations of the Spearman rank correlation method in the Relate option). For the CR data set, sequences could only be aligned within a genus and so Mantel’s tests were performed within a genus between CR distances and relevant distances from the three coding genes. Predictive analytics software (PASW) Statistics 18 (SPSS Inc.; www.spss.com) was used to generate descriptive statistics (minimum, maximum and s.e.) between all species within © 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
ATP82L8331 CO3.23R CO3.62R COIII2H9236 HCH Lys.22F Lys.31F FishF1 FishF2 FishR1 FishR2 CRL19 CRMT16498H Pro-L HYPSHD HYPSLA Ret.Thr31 RF.Thr48
ATP
cytb
CR
COI
Primer name
Region Forward Reverse Reverse Reverse Reverse Forward Forward Forward Forward Reverse Reverse Forward Reverse Forward Reverse Forward Reverse Reverse
Direction AAAGCRTYRGCCTTTTAAGC GGCTTGGGTCAACTATGTGGT TTATTAGAAGGGCGGCAACTG GTTAGTGGTCAKGGGCTTGGRTC TACTATGTGAAATGCGTGTG AAAGCGTTAGCCTTTTAAGC GCCTTTTAAGCTAAAGATTGG TCAACCAACCACAAAGACATTGGCAC TCGACTAATCATAAAGATATCGGCAC TAGACTTCTGGGTGGCCAAAGAATCA ACTTCAGGGTGACCGAAGAATCAGAA ACCACTAGCACCCAAAGCTA CCTGAAGTAGGAACCAGATG CTACCTCCAACTCCCAAAGC GGGTTGTTGGAGCCAGTTTCGT GTGGCTTGAAAAACCACCGTT CTCCAACCTCCGACTTACAAG GCAGTAGGAGGGAATTTAACCTTCG
Primer sequence
Primer reference S. McCafferty (unpubl. data) P. Unmack (unpubl. data) P. Unmack (unpubl. data) S. McCafferty (unpubl. data) McGlashan & Hughes (2001) P. Unmack (unpubl. data) P. Unmack (unpubl. data) Ward et al. (2005) Ward et al. (2005) Ward et al. (2005) Ward et al. (2005) Bernatchez & Danzmann (1993) Meyer et al. (1990) Palumbi et al. (1991) Thacker et al. (2007) Thacker et al. (2007) P. Unmack (unpubl. data) Unmack & Dowling (2010)
Table II. Mitochondrial primer sequences and sources (see Table I for species details)
C O M PA R I N G F R E S H WAT E R F I S H M I T O C H O N D R I A L G E N E S
2107
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
2108
T. J . PA G E A N D J . M . H U G H E S
families (including all genera), between different genera within families and within genera for each gene region (within genera only for CR). Linear regressions were performed in PASW to assess the differing relative levels of divergence of the three coding genes between species within families. To avoid non-independence of comparisons in the linear regressions, each specimen was only used in a single pair-wise comparison within a family and was chosen randomly without replacement using the PopTools version 3.1 [Commonwealth Scientific and Industrial Research Organisation (CSIRO)] corrected random function in Excel 3.1 (Microsoft Corporation; www.microsoft.com).
RESULTS O N L I N E D ATA B A S E S E A R C H E S
By far, the two most commonly used mitochondrial gene regions for Journal of Fish Biology phylogeography or phylogeny papers published from 2000 to 2010 (through to volume 76 issue 2) have been cytb (49 papers) and CR (42), followed distantly by ATP (seven), with COI in seventh place with four papers (see Fig. 2 for all results). This does not include papers specifically written as DNA barcoding-only papers that do not include phylogeography or phylogeny in their topics. Another analysis of fish papers also found cytb the most commonly used and COI seventh (Teletchea, 2009). The EMBL search found that cytb and COI had similar numbers of sequences publicly available for ray-finned fishes (57 513 and 57 155, respectively), while there were 27 082 CR and 5463 ATP sequences. This is only a raw sequence count and does not deal with the relative taxonomic coverage of these sequences nor will they all be overlapping fragments (as the barcode sequences will be).
Phylogeography or phylogeny papers in JFB 2000–2010
50
49
45
42
40 35 30 25 20
Mitochondrial
15 10
7
6
5
6
5
4
4
4
4
3
3 1
0 Cytb
CR ATP6–8 ND1
16S 12S COI Genome ND2 ND5–6 ND3 ND4– 4L COII Mitochondrial gene regions used
Fig. 2. Mitochondrial gene regions used in fish phylogeography or phylogeny papers published between 2000 and 2010 in the Journal of Fish Biology.
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
C O M PA R I N G F R E S H WAT E R F I S H M I T O C H O N D R I A L G E N E S
2109
The BOLD search found 65 502 COI sequences (14 552 publicly available) from 10 802 species. S E QU E N C E P RO D U C T I O N
All 42 of the specimens from this study were sequenced for COI and cytb and all but two for ATP (see Table I for details and GenBank accession numbers). CR proved more challenging with 32 of 42 specimens sequenced. All protein-coding sequences were translated to amino acids in MEGA4 (vertebrate mitochondrial genetic code) with no stop codons nor indels present. COI sequences (602 bp; codon start position 2) correspond to positions 5568 to 6169 of the Lake Kutubu rainbowfish Melanotaenia lacustris Munro, mtDNA genome (accession number AP004419; Miya et al., 2003) and positions 102 to 703 of the DNA barcode for fishes (Ward & Holmes, 2007). ATP sequences (539 bp; codon start position 1) are positions 8101 to 8639 and cytb (504 bp; codon start position 1) positions 14 445 to 14 948 of the M. lacustris genome. CR sequences were of varied lengths (327 to 409 bp) and roughly correspond to positions 15 677 to 16 021 of M. lacustris. All morphological identifications were double checked by comparing all resulting sequences to the BOLD online database and GenBank (BLASTn search at: blast.ncbi.nlm.nih.gov), as well as to unpublished sequences from the present and other laboratories. All identifications were confirmed, except an Oreochromis specimen (specimen number GU-AW421), originally identified as the Mozambique tilapia Oreochromis mossambicus (Peters), but which may be a hybrid with another species such as the blue tilapia Oreochromis aureus (Steindachner) or the Nile tilapia Oreochromis niloticus (L.) (Mather & Arthington, 1991), and a Geophagus specimen (number GU-AW303) which could not be assigned to any species [perhaps the pearl cichlid Geophagus brasiliensis (Quoy & Gaimard)]. T R E E R E S U LT S
A data set of the protein-coding genes of 72 taxa was assembled in MEGA4 from all specimens with COI, ATP and cytb sequences (40 from this study and 32 from downloaded mitochondrial genomes). Neighbour-joining trees were produced for all three genes together (Fig. 3), all genes separately and all two-gene combinations (see Appendix SI for all trees). Although DNA barcoding sensu stricto is primarily concerned with the identification at the species level (Hubert et al., 2008), mitochondrial sequences clustered using K2P distances can betray a systematic signal at the generic, and even family, levels (Frezal & Leblois, 2008), which can be useful in preliminary identifications, particularly for partial specimens such as fin clips, before assigning to species. The performance of each gene on its own (and all two-gene combinations) to cluster together family units (per Pusey et al., 2004) was compared to the three gene data set using monophyly and bootstrap support for each family (Table III). It is akin to imagining that any one species per family was an unidentified specimen and then determining where it would end up on a tree. Of the 18 families present in this three gene data set, nearly all (16) were recovered with strong support, one with moderate support (Chandidae, but strongly supported © 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
2110
T. J . PA G E A N D J . M . H U G H E S
99
98
99 82 44 99 84 99 93
99
80
99
73 42 62 69
99 99
71 99
99 82 93
99
99 99
99
59
99
52 82
99 99 94
99 0·02
99
89
Carassius auratus Carassius cuvieri Cyprinus carpio Cyprinidae 99 Xenocypris argentea Xenocypris davidi Anguilla reinhardtii Anguilla anguilla Anguillidae Anguilla australis Neoceratodus forsteri Ceratodontidae Arapaima gigas Osteoglossum bicirrhosum Scleropages formosus Osteoglossidae Scleropages leichardti Megalops atlanticus Megalopidae Megalops cyprinoides Neosilurus hyrtlii Tandanus tandanus Plotosidae Porochilus rendahli 99 Mogurnda adspersa Mogurnda mogurnda Philypnodon grandiceps Philypnodon macrostomus Gobiomorphus australis Eleotridae Gobiomorphus coxii Eleotris acanthopoma Hypseleotris sp. Midgley's Hypseleotris compressa Hypseleotris galii Dorosoma cepedianum Dorosoma petenense Ethmalosa fimbriata Clupeidae Nematalosa erebi Nematalosa japonica Galaxias maculatus Galaxiella nigrostriata Galaxiidae Retropinna retropinna Retropinna semoni SEC1 Retropinnidae 99 Retropinna semoni SEQ2 Retropinna semoni SEQ1
99
97
Lates calcarifer Centropomidae Nannoperca australis Nannoperca oxleyana Percichthyidae Leiopotherapon unicolor Rhynchopelates oxyrhynchus Terapontidae 99 Gambusia affinis Gambusia holbrooki Poeciliidae 99 Xiphophorus hellerii Xiphophorus maculatus Ambassis marianus 99 Ambassis agassizi Chandidae Ambassis sp. NW2 Mugil cephalus 99 Mugil cephalus Australia Mugilidae Notesthes robusta Scorpaenidae Geophagus sp. Hypselecara temporalis Paratilapia polleni Cichlidae Oreochromis sp TP3 Oreochromis sp KM4 Craterocephalus stercusmuscarum Craterocephalus stramineus Atherinidae Craterocephalus marjorae Hypoatherina tsurugae Pseudomugil mellis Pseudomugil signifer Pseudomugilidae 79 Rhadinocentrus ornatus CEQ5 99 Rhadinocentrus ornatus SER5 Rhadinocentrus ornatus SEQ5 Melanotaeniidae Melanotaenia lacustris 99 Melanotaenia splendida 99 Melanotaenia duboulayi 94 Melanotaenia fluviatilis
Fig. 3. Neighbour-joining tree produced from combined data set of COI, ATP and cytb genes (see Table I), showing families and bootstrap support values. All drawings are reproduced with permission from Pusey et al. (2004), except Galaxiidae (with permission from McDowall, 1990), Cichlidae, Cyprinidae and Poeciliidae adapted from www.FishBase.org by R. Cada. 1, possible cryptic species, see Hammer et al. (2007); 2, after Allen et al. (2002); 3, informal name. Possible hybrid, see Mather & Arthington (1991); 4, after Mabuchi et al. (2007); 5, possible cryptic species, see Page et al. (2004).
© 2010 The Authors Journal of Fish Biology © 2010 The Fisheries Society of the British Isles, Journal of Fish Biology 2010, 77, 2093–2122
Anguillidae Atherinidae Chandidae Cichlidae Clupeidae Cyprinidae Eleotridae Galaxiidae Megalopidae Melanotaeniidae Mugilidae Osteoglossidae Percichthyidae Plotosidae Poeciliidae Pseudomugilidae Retropinnidae Terapontidae Total families supported S = strong (80–100%) M = moderate (50–79%) W = weak (