variability constrains associated with breeding programs, the phenotypic and genetic diversity of heirloom cultivars (landraces) emerges as a landmark to rescue ...
Computational clustering integration of metabolomics, transcriptomics and agronomical data for germplasm selection in a highly diverse tomato landrace collection Cernadas RA1, Conte M1, Pividori M2, Insani M1, López MG1, Asís R3, D´Angelo M4, Zanor M4, Boggio S4, Valle E4, Asprelli P5, Peralta I6, Milone D2, Stegmayer G2, Carrari F1 ABSTRACT Tomato (S. lycopersicum) is a major vegetable crop consumed worldwide that provides a valuable source of vitamins and antioxidants for the human diet. Because of the variability constrains associated with breeding programs, the phenotypic and genetic diversity of heirloom cultivars (landraces) emerges as a landmark to rescue desired agronomic traits for crop improvement. Here, we surveyed a germplasm collection of 68 tomato Andean landraces maintained and cultivated by family farmers. Distinct sets of these accessions were cultivated in the Cuyo region (Mendoza) during several seasons (i.e. 2005-06, 2006-07, 2008-09, 2009-10, 2010-11 and 2011-12) and characterized by morpho-agronomic traits as well as by biochemical characters of the mature fruits. Our analyses undertook a combined approach using, i) GC-MS, NMR and HPLC to identify fruit soluble and volatile metabolites, ii) transcriptomics, and iii) computational biology to integrate the whole dataset. Preliminary results allow to define genotypic clusters according to agronomical traits, including metabolite profiles, antioxidant properties and vitamins accumulation. We also explored organoleptic properties of the different accessions to establish inter-cluster correlations between volatile content and fruit taste. Finally, a multi focus clustering analysis based on accessions diversity and environmental variation along the experimental seasons provides a method to infer the most probable traits to be stable inherited. METHODS Sensory analyses of mature fruits were conducted as is described previously with at least 10 experienced panelists (Baldwin et al., 1998). Antioxidant metabolites were measured by HPLC–DAD–MS/MS in mature fruits and in vitro antioxidant capacities determined by TEAC and FRAP methods as described by Di Paola Naranjo et al., (2016). Volatile organic compounds (VOCs) of mature fruits from Andean tomato landraces were performed by gas chromatography-mass spectrometry (GC-MS) (Cortina et al., 2016) and metabolite profiles of soluble compounds obtained by GC-TOF-MS as described before (Lisec et al., 2006). A combinatorial of year/location data are presented from field trial experiments performed under field conditions in the Cuyo region (Mendoza, Argentina, 33° 50´S, 68°52´W, 900 masl). Morpho-agronomic traits were recorded from the same experiments. RESULTS Figure 1. Geo-references of tomato heirloom varieties used in this study A total of one hundred tomato accessions, including heirloom varieties collected in the Argentinean Andean valleys (Figure 1) and commercial varieties of distinct origins were phenotyped in detail. Particularly, heirloom varieties have been subjected to minimal selection process during breeding performed by local farmer. Therefore, this germplam constitute a very valuable genetic resource.
565
564
563
562 566
569 571
570
553
568
550 560
559
557
All plants were cultivated under open field conditions of the Cuyo region in Mendoza, Argentina during the spring-summer season (October-March).
575 554 572
571
552
551
574
3827
558 555
573
548
561
556
Sensory panels for fresh fruits of 16, 18 and 22 accessions harvested at the end of the 2010, 2011 and 2012 summers, respectively, were assayed for their organoleptic properties based on trait descriptors by partially trained personnel.
549
3833 3842
3819
3832
3815 3831
3812 3811
3825
3808
3836
3829 3840
3816
3824
3837 3837
3834 3820
3822
Figure 1. Geo-references and phenotype of some tomato varieties used in this study.
HPLC and GCMS data
Morpho-agronomic data
Sensory panels data
Accession/year 2006 2007 2010 2012
Accession/year 2010 2011 2012
Accession/year 2009 2010 2012 Data
80 95 140
Accession/year 2009 2010 2011 15 36 28
Vola%les An%oxidants Vitamins Aminoacids other
64 68 39 19
Data
Gene Expression data Accession/year 2010 Data
100 tomato accessions analyzed
Clustermatch (BioDataFusion, Version 1.0)
Heirloom varie,es
Clustering variables (≠data sources/harvest seasons)
552, 553, 558, 559, 561, 564, 567, 568, 569, 715 551, 554, 555, 556, 557, 560, 563, 565, 3809, 3810, 3811 550, 570, 571, 572, 573, 574, 3812, 3815, 3823
Genotype ensemble (k=10)
575, 3814, 3816, 3820, 3832, 3840, 3841, 3842, 3843, 3844 548, 549, 566, 3806, 3808, 3813, 4735, 4736, 4739, 4171 562, 3805, 4618, 4740, 4741, 4742, 4743, 4748, 4749, 4751
10
Computa%onal biology
Genotype selec%on
Commercial varie,es Franco, 4623, CheAmRed, 4750, Elpida, Franco*, Biguá, CZB, CheRedRoj
2523, 2535, 2637, 2677, 2724, 2767, 2777, 2790, 4745 3817, 3819, 3821, 3822, 3827, 3828, 3829, 3835, 3838, 3839 3818, 3824, 3825, 3826, 3830, 3831, 3833, 3834, 3836, 3837
Figure 2. Flow chart of data collection, integration and analysis. 1-Instituto de Biotecnología – INTA, Castelar. Buenos Aires, Argentina 2- Laboratorio de Inv. en Señales e Inteligencia Computacional. Facultad de Ingeniería y Ciencias Hídricas. Universidad Nacional del Litoral. Santa Fe, Argentina. 3- CIBICI-CONICET. Universidad de Córdoba. Córdoba, Argentina 4- Instituto de Biología Molecular y Celular de Rosario. Rosario, Argentina 5- Estación Experimental Agropecuaria La Consulta – INTA. San Carlos, Mendoza 6- Facultad de Ciencias Agrárias Universidad Nacional de Cuyo. Mendoza, Argentina
Gene expression analyses showed that nearly 2,600 probesets (FDR