- Title
-
The genetic factors of bilaterian evolution
- Authors
- Heger, P., Zheng, W., Rottmann, A., Panfilio, K.A., Wiehe, T.
- Source
- Full text @ Elife
(A) Comparison of three major orthology databases with the BigWenDB. The relative contribution of four metazoan clades (Deuterostomia, Ecdysozoa, Lophotrochozoa, and the paraphyletic group "non-Bilateria") is shown as stacked bar graph. The count of metazoans in our database (175 species) is set to 100%. In comparison to other databases, the BigWenDB has a larger repertoire of critical lophotrochozoans and non-bilaterian Metazoa. (B) Consensus phylogeny describing the relationships of 21 metazoan phyla covered in our database, after Laumer et al., 2015; Telford et al., 2015; Torruella et al., 2015; Cannon et al., 2016. Bold labels to the left or above branches indicate its ancestor (A: Arthropoda, B: Bilateria, D: Deuterostomia, E: Ecdysozoa, Eu: Eumetazoa, L: Lophotrochozoa, M: Metazoa, O: Opisthokonta, P: Protostomia). Numbers in parentheses (after the phylum name) indicate the number of species present from this phylum. Horizontal bars visualise the number of database sequences that belong to a given phylum (logarithmic scale; transcriptomic, ORF, and NCBI sequences summed up). Species silhouettes were downloaded from www.phylopic.org. Morphological innovations of Bilateria according to Baguñà et al., 2008 are highlighted in a shaded box. |
The amount of sequence data populating the BigWenDB is shown together with its phylogenetic distribution. The coloured bars at the perimeter (red, green, blue) document the contributions of three different sequence sources (bar height proportional to the number of sequences, see ruler at top left): (1) Sequences from 204 opisthokonts (animals, choanoflagellates, and fungi) with >8000 entries in the NCBI database (downloaded on May 25, 2015; coloured in red). (2) Sequences derived from the transcriptomes of 64 species under-represented at NCBI (non-bilaterian animals, lophotrochozoans, and representatives of additional phyla; green). (3) ORFs derived from the genome sequences of 25 representative metazoans (blue), including 8 non-bilaterian species. In total, 124,031,501 sequences from 273 species cover the eukaryotic tree of life in the most comprehensive way so far (see text for details). Phylogenetic relationships after NCBI taxonomy. |
Boxplots show the size distribution of genomic ORFs (ORF), transcriptomic ORFs (TRS), and NCBI sequences (GI) in comparison to the average size of protein domains collected in the PFAM database V31.0 (March 2017; 16,712 entries). Data points outside 1.5 × the interquartile range are omitted for clarity. |
Outliers (above whiskers) are omitted for clarity. Whiskers extend to 1.5 × the interquartile range (default in R). Box width is proportional to the square root of the sequence number. nB = non-bilaterian Metazoa, D = Deuterostomia, E = Ecdysozoa, L = Lophotrochozoa. |
Inventory of protein domains and associated GO terms for three animal lineages |
( |
The detailed view of positions 396–576 of OG_28197 (top; 648 AA alignment with 34 sequences from 22 deuterostomes and 8 lophotrochozoans) and of positions 174–354 of OG_33174 (bottom; 430 AA alignment with 29 sequences from 24 deuterostomes and 4 lophotrochozoans) illustrates the existence of domain-like conservation patterns despite the absence of known protein domains. Coloured blocks indicate sequences of lophotrochozoan (L) and deuterostome (D) origin. The two displayed alignments lack ecdysozoan sequences; they were chosen for their small size and convenient presentation. Short stretches of unaligned sequences were removed for clarity. Dashes indicate sequence gaps. Colouring of amino acids reflects chemical similarity (UGENE standard colour scheme; |
View of a 189 AA alignment of OG_13336 (top; 74 sequences from 40 deuterostomes, 11 ecdysozoans, and 5 lophotrochozoans) and of a 75 AA alignment of OG_31055 (bottom; 30 sequences from 8 deuterostomes, 6 ecdysozoans, and 7 lophotrochozoans), illustrating the existence of domain-like conservation patterns despite the absence of known protein domains. Short stretches of unaligned sequences were removed for clarity. Colouring of amino acids reflects chemical similarity (UGENE standard colour scheme; |
View of a 234 AA alignment with 135 sequences from 22 deuterostomes, 8 ecdysozoans, and 9 lophotrochozoans, illustrating the existence of domain-like conservation patterns despite the absence of known protein domains. Short stretches of unaligned sequences were removed for clarity. Colouring of amino acids reflects chemical similarity (UGENE standard colour scheme; |
( |
Starting from Bilateria (left), a protostome lineage leading to dipterans (upper) and a deuterostome lineage leading to mammals (lower) are shown as schematic phylogenetic tree. Sister clades to the selected taxa are denoted on short branches in the center. Each barplot displays the number of lineage-specific orthogroups (y axis) as a function of orthogroup size (x axis) for the selected taxonomic group (Protostomia, Ecdysozoa, Arthropoda etc.). The total species count (within BigWenDB) for each of the eleven taxonomic groups is indicated on top of the corresponding barplots (# Species). The count of lineage-specific genes decreases with growing orthogroup size. A red line denotes the number of orthogroups in which at least 50% of the species of a selected lineage are present. The corresponding number of lineage-specific orthogroups is highlighted in red next to the line. |
Top: View of a 194 AA alignment of OG_26631 with 36 sequences from 2 chelicerate, 1 myriapod, 2 crustacean, and 23 hexapod species. Center: View of a 165 AA alignment of OG_34551 with 28 sequences from 4 chelicerate, 1 myriapod, 2 crustacean, and 19 hexapod species. Bottom: View of a 155 AA alignment of OG_35928 with 27 sequences from 2 chelicerate, 1 myriapod, and 21 hexapod species. The alignments illustrate the existence of domain-like conservation patterns despite the absence of known protein domains. Short stretches of unaligned sequences were removed for clarity. Colouring of amino acids reflects chemical similarity (UGENE standard colour scheme; |
Two consensus phylogenetic trees showing the relationship of major metazoan lineages. The five factors of the Nodal signalling pathway (Nodal, Lefty, EGF-CFC, FoxH1, and Eomes) are displayed as coloured boxes. Their phylogenetic distribution and inferred evolutionary origin are mapped onto the tree. Gene births are indicated as coloured boxes above the respective branch. Inferred losses are represented by crosses. Bold labels to the left of a branch indicate branch ancestors: B = Bilateria, Eu = Eumetazoa, M = Metazoa. (A) Previous results regarding the evolution of Nodal pathway genes, as known from the literature. (B) Revised evolutionary history of the Nodal pathway genes according to our results. Note that none of the five factors has been found in arthropods and nematodes. The ecdysozoan boxes for Eomes and FoxH1 are derived from the presence of the genes in a single priapulid species. Grey shading: Hypothetical emergence of a putative kernel for mesoderm specification and neural patterning. |
( |
Maximum likelihood phylogeny of selected bilaterian Lefty and Nodal proteins. The corresponding multiple sequence alignment consists of 24 sequences with 446 columns and 29.01% gaps and undetermined characters. The sequences correspond to OG_11821 (Lefty) and OG_12210 (Nodal) of the original clustering plus several additional candidate sequences from public repositories (red dots). Blue dots highlight whether a sequence is derived from transcriptomic (light blue) or genomic ORF data (dark blue). All other sequences can be accessed at NCBI with the gene identifiers given as branch labels. Blue triangles identify previously described Lefty and Nodal reference sequences. Bootstrap values below 50% are removed for clarity. There are three Nodal-related genes in teleosts, cyclops, squint, and southpaw, as a result of lineage-specific duplications ( |
Maximum likelihood phylogeny of selected metazoan Fox genes. The multiple sequence alignment consists of 52 sequences aligned over 315 positions (proportion of gaps and undetermined characters: 25.07%). It is generated from OG_36001 (FoxH1), OG_63374 (RBH with OG_36001; orthogroup ID labeled in red), and representative sequences of OG_3972 (FoxD4 as outgroup; third-best hit of OG_36001 in HMM-HMM searches, see |
( |
Top (ex): Predicted structure of the extracellular domain plus transmembrane region of seven selected Robo proteins. Bottom (cp): Predicted structure of the transmembrane region plus cytoplasmic part of seven selected Robo proteins. Robo1 orthologues of two deuterostomes (Hsap = Homo sapiens; Spur = Strongylocentrotus purpuratus), one lophotrochozoan (Lana = Lingula anatina), two ecdysozoans (Dmel = Drosophila melanogaster; Tpse = Trichinella pseudospiralis), and two cnidarians (Hvul = Hydra vulgaris; Spis = Stylophora pistillata) were analysed. "% conf" indicates the percentage of residues modelled at >90% confidence. "% dis" indicates the predicted percentage of disordered regions. Bottom right: Schematic outline of the Robo domain structure with five immunoglobulin domains (IG1–IG5) and three fibronectin type III domains (FN3) in the extracellular part and four conserved cytoplasmic motifs (CC0–CC3) in the intracellular part. Like their bilaterian counterparts, cnidarian Robo candidates display a disorganised protein structure in the cytoplasmic part despite differences in structural features (Figure 6—figure supplement 1, Figure 6—figure supplement 2). The extracellular part (top row), on the other hand, is similarly organised across metazoans. |
Multiple sequence alignment of 41 bilaterian and 10 cnidarian (bottom) Robo proteins. A fragment of the full alignment is shown (AA 1667–1697), centering on the conserved cytoplasmic motif CC1 (corresponding to sequence "TPYATTQLI" of human Robo1). Colouring of amino acids reflects chemical similarity (UGENE standard colour scheme; |
Multiple sequence alignment of 41 bilaterian and 10 cnidarian (bottom) Robo proteins. A fragment of the full alignment is shown (AA 1271–1617), starting with the transmembrane region (blue part on the left). Colouring of amino acids reflects chemical similarity (UGENE standard colour scheme; |
Two maximum likelihood phylogenies of representative bilaterian Slit sequences. Sequences were downloaded from NCBI or extracted from the corresponding Slit orthogroup OG_5717. In subfigure ( |
The NTRK receptor and 14 major neurotrophic factors are displayed as coloured boxes. Their phylogenetic distribution and inferred evolutionary origin are mapped onto the tree (see Supplementary file 1–Supplementary Table 22 and Supplementary file 1–Supplementary Table 23). Gene births are indicated as coloured boxes above the respective branch of the tree (left). Inferred losses are shown as coloured crosses in the matrix. Bold labels to the left of a branch indicate branch ancestors: Ac = Actinopterygii, B = Bilateria, Ch = Chordata, Eu = Eumetazoa, Gn = Gnathostomata, M = Metazoa, Sa = Sarcopterygii. The neurotrophic factors of Cladistia, the sister group of Actinopteri, are inferred and distinguished by a question mark as the dataset lacks species from this lineage. |
Maximum likelihood phylogenetic analysis of 53 metazoan NTRK and ROR1 sequences (outgroup), aligned over 602 AA. Proportion of gaps and completely undetermined characters in the corresponding alignment: 16.84%. Sequences were collected from different sources: NTRK receptor sequences from protostomes are derived from OG_8965–1.4 of the 1.4 clustering, an orthogroup containing RTKs only ( |
( |