ZFIN ID: ZDB-PUB-121120-2
|
Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish
Doelken, S.C., Köhler, S., Mungall, C.J., Gkoutos, G.V., Ruef, B.J., Smith, C., Smedley, D., Bauer, S., Klopocki, E., Schofield, P.N., Westerfield, M., Robinson, P.N., and Lewis, S.E.
ABSTRACT
Numerous disease syndromes are associated with regions of copy number variation (CNV) in the human genome, and in most cases
the pathogenicity of the CNV is thought to be related to altered dosage of the genes contained within the affected segment.
However, establishing the contribution of individual genes to the overall pathogenicity of CNV syndromes is difficult and
often relies on identification of potential candidates through manual searches of the literature and on-line resources. We
describe here the development of a computational framework to comprehensively search phenotypic information from model organisms
and single-gene human hereditary disorders and thus speed the interpretation of the complex phenotypes of CNV disorders. There
are currently more than 5000 human genes about which nothing is known phenotypically, but for which detailed phenotypic information
for their mouse and/or zebrafish orthologs is available. Here we present an ontology-based approach to identify similarities
between human disease manifestations and the mutational phenotypes in characterised model organism genes, and thus can be
used even in cases where there is little or no information about the function of the human genes. We applied this algorithm
to detect candidate genes for 27 recurrent CNV disorders and identified 802 gene/phenotype associations, approximately half
of which were previously reported and half that were novel candidates. 431 associations were made solely on the basis of model
organism phenotype data. Additionally, we observed a striking, statistically significant tendency for individual disease phenotypes
to be associated with multiple genes located within a single CNV region, a phenomenon that we denote as pheno-clustering.
Many of the clusters also display statistically significant protein functional similarity or vicinity within the protein-protein
interaction network. Our results provide a basis for understanding previously un-interpretable genotype-phenotype correlations
in pathogenic CNVs and for mobilizing the large amounts of model organism phenotype data to provide insights into human genetic
disorders.