GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE PLANT SPECIES. Alessandra Vigilante, Mara Sangiovanni, Chiara Colantuono, Luigi Frusciante and Maria Luisa Chiusano Dept. of Soil, Plant, Environmental and Animal Production Sciences CAB (Computer Aided Biosciences) group Web: http://cab.unina.it vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
BACKGROUND: Arabidopsis thaliana as a reference genome Arabidopsis thaliana WAS THE FIRST PLANT GENOME TO BE COMPLETELY SEQUENCED vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
BACKGROUND: Arabidopsis thaliana as a reference genome A REFERENCE GENOME SHOULD BE: FULLY RELIABLE SAFELY ANNOTATED WELL UNDERSTOOD IN TERMS OF EVOLUTIONARY HISTORY vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
BACKGROUND: Arabidopsis thaliana as a reference genome A REFERENCE GENOME SHOULD BE: FULLY RELIABLE SAFELY ANNOTATED WELL UNDERSTOOD IN TERMS OF EVOLUTIONARY HISTORY Arabidopsis thaliana GENOME IS: GENE DENSE COMPLEX BECAUSE HIGHLY DUPLICATED AND CLAIMED TO BE ARCHEOPOLYPLOID STILL NOT EXHAUSTIVELY ANNOTATED vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
BACKGROUND: Arabidopsis thaliana as a reference genome Whole genome duplication events Nature (2007) Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla . vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
BACKGROUND: Arabidopsis thaliana as a reference genome V. vinifera A. thaliana Whole genome duplication events Nature (2007) Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla . vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
STRATEGY: Unraveling Arabidopsis thaliana genome IT COULD BE USEFUL TO REVIEW THE GENOME IN TERMS OF RELATIONSHIPS BETWEEN DUPLICATED GENES/PARALOGS vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
STRATEGY: Unraveling Arabidopsis thaliana genome IT COULD BE USEFUL TO REVIEW THE GENOME IN TERMS OF RELATIONSHIPS BETWEEN DUPLICATED GENES/PARALOGS NETWORKS OF PARALOG GENES vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline Gene duplication analysis: pipeline Arabidopsis thaliana proteome (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Network extraction Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline Gene duplication analysis: pipeline Arabidopsis thaliana proteome (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Network extraction Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline Gene duplication analysis: pipeline Arabidopsis thaliana proteome (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Network extraction Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline Gene duplication analysis: pipeline Arabidopsis thaliana proteome (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Network extraction Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline Gene duplication analysis: pipeline A. thaliana GENES AND PARALOGIES ARE REPRESENTED AS AN (UNDIRECTED) GRAPH G(V,E) WHERE: - V ={v 1 ,..v N } = genes - E ={e 1 ,..e M } = paralogies vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline Gene duplication analysis: pipeline A. thaliana GENES AND PARALOGIES ARE REPRESENTED AS AN (UNDIRECTED) GRAPH G(V,E) WHERE: - V ={v 1 ,..v N } = genes - E ={e 1 ,..e M } = paralogies vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Gene duplication analysis Gene duplication analysis: pipeline Arabidopsis thaliana proteome 27169 (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Gene duplication analysis Gene duplication analysis: pipeline Arabidopsis thaliana proteome 27169 (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes 21843 5326 3017 Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Gene duplication analysis Gene duplication analysis: pipeline 1400 1000 600 200 0 2 3-9 10-30 31-207 31-207 208-5168 Genes A network contains all and only the genes that share at least one paralogy relationship. Each gene belongs to one and only one network. vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Networks applications Gene duplication analysis: pipeline Networks applications NETWORKS ARE A USEFUL TOOL TO DEEPLY INVESTIGATE RELATIONSHIPS BETWEEN SUBSETS OF GENES SHARING DUPLICATION RELATIONSHIPS NETWORKS CAN BE A USEFUL TOOL TO REFINE GENE ANNOTATION vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Networks applications for the annotation of Gene duplication analysis: pipeline Networks applications unknown information The 19% of the proteome is still annotated as “unknown protein” NETWORKS CAN HELP IN REFINING GENE ANNOTATION vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Networks applications for study of relationships Gene duplication analysis: pipeline Networks applications between gene familes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Networks applications for study of relationships Gene duplication analysis: pipeline Networks applications between gene familes Networks can be a useful tool for highlighting evolutionary relationships between different gene families vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Networks of duplicated genes Gene duplication analysis: pipeline Networks applications vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Networks of duplicated genes Gene duplication analysis: pipeline Networks applications vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Networks of duplicated genes Gene duplication analysis: pipeline Networks applications vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline RESULTS: 2-gene networks Networks applications 2-gene networks vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline RESULTS: 2-gene networks Networks applications 2-gene networks Arabidopsis thaliana chromosomes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline RESULTS: 2-gene networks Networks applications 2-gene networks All protein-coding genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Gene duplication analysis: pipeline RESULTS: 2-gene networks Networks applications 2-gene networks All genes involved in two-gene networks vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: 2-gene networks and singleton genes Gene duplication analysis: pipeline Networks applications 2-gene networks AT LEAST 5% OF THE ENTIRE PROTEOME HAS A SINGLE PARALOGY RELATIONSHIP AT LEAST 20% OF THE ENTIRE PROTEOME IS A SINGLETON ABOUT ONE QUARTER OF THE ENTIRE PROTEOME HAS ZERO OR AT MOST ONE PARALOGY RELATIONSHIP vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Singleton genes analysis Gene duplication analysis: pipeline Networks applications 2-gene networks WHAT ABOUT SINGLETON GENES? vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
RESULTS: Singleton genes analysis Gene duplication analysis: pipeline Networks applications 2-gene networks Singleton genes Blastn mRNA not having protein-coding Singleton genes validated or sequences against paralogs not validated by ESTs ESTs 3588 vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it
Recommend
More recommend