ASM News
ASM Home Site Map Search ASM Site

Phylogeny of Bacteria: Are We Now Close to Understanding It?

Conserved signature sequences in proteins provide a means for assigning bacteria to phylogenetic groups and determining branching order

Radhey S. Gupta

Understanding evolutionary relationships among prokaryotes constitutes one of the most fundamental challenges in biology. Because prokaryotes were the sole inhabitants for nearly the first 2 billion years of life on earth, they are key to understanding fundamental questions about the nature and origin of the first cell, metabolism, photosynthesis, information transfer processes, and eukaryotic cells. Earlier efforts based on morphological, biochemical, and physiological characteristics met with limited success in describing how various groups of prokaryotes are evolutionary related. However, using genomic sequences to deduce evolutionary relationships provides new hope for discerning true genealogical relationships among prokaryotes.

During the past 25 years, considerable attention focused on the nature of the highest taxa within prokaryotes. For instance, Carl Woese and coworkers at the University of Illinois, Urbana, extensively analyzed 16S rRNA sequences and suggested that prokaryotes divided into two primary taxa (domains), Archaea and Bacteria, originating from a universal ancestor. These taxa are distinguished on the basis of several biochemical and sequence features, as well as many gene and protein phylogenies.

Figure 1

Nonetheless, Archaea and gram-positive bacteria are closely related in terms of cell structure, genomic organization (Fig. 1), and in many other gene-protein phylogenies. Whether the Archaea are truly distinct from other Bacteria, or are relatives of gram-positive bacteria that arose from them in response to prolonged antibiotic selection pressure, is an important question to debate, and one with important implications for microbiology. However, because the relationship between Archaea and Bacteria constitutes only a small part of prokaryotic phylogeny, there are many other important evolutionary relationships to discern among the members of Bacteria, which comprise the vast majority of the prokaryotes.

Bacterial Phylogeny: Current Understanding and Key Unresolved Questions

According to 16S rRNA analyses, bacteria are divided into 23 main phyla, which include: Aquificae, Thermotogae, Thermodesulfobacteria, Deinococcus-Thermus, Chrysiogenetes, Chlorflexi, Thermomicrobia, Nitrospirae, Deferribacteres, Cyanobacteria, Chlorobi, Firmicutes (low G+C gram positive), Actinobacteria (high G+C gram positive), Bacteroidetes, Planctomycetes, Verrucomicrobia, Chlamydiae, Spirochaetes, Fibrobacteres, Acidobacteriae, Fusobacteria, Dictyoglomi, and Proteobacteria. The criteria used to identify members of these phyla are ill-defined, and the overall picture is unsatisfactory. For instance, several of these phyla consist of only one or a few species (e.g., Thermodesulfobacteria, Thermomicrobia, Chrysiogenetes, Fibrobacteres, and Deferribacteres), whereas others (e.g., Proteobacteria, Cyanobacteria, and the low- and high-G+C gram-positive groups) contain several hundred to thousands of species accounting for more than 90-95% of the known bacteria.

All main phyla within Bacteria are recognized on the basis of their branching in the rRNA trees with no other markers with which to distinguish species belonging to separate phyla. When many of the major bacterial phyla were originally described, they were clearly distinguishable from each other in the 16S rRNA trees based on the long branches that separated them. However, this seeming clarity resulted mainly from limited sequence data, a perspective that has since vanished amid rapidly expanding sequence data, according to Wolfgang Ludwig of the University of Munich in Germany and Hans-Peter Klenk at Epidauros Biotechnology, Inc. Germany. Hence, assigning species to various groups based solely on branching positions in rRNA trees is imprecise.

A critical issue in bacterial phylogeny is the relative branching orders of the main phyla within Bacteria. Some scientists assume that the failure of 16S rRNA-based (or other, protein-based) trees to resolve these relationships means that such relationships can never be deduced. Thus, increasing numbers of scientists accept that all main groups within Bacteria branched off directly from a common ancestor, precluding an understanding of the relationship between different bacterial phyla.

The two major challenges to understanding evolutionary relationships within Bacteria that we currently face are (i) developing molecular criteria by which all of the main phyla within Bacteria can be clearly defined and (ii) understanding how the different phyla branched from a common ancestor and how they are related. Understanding these issues will require new approaches to analyzing molecular sequence data. One approach that shows great promise involves use of conserved inserts and deletions—signature sequences—found in various proteins.

Signature Sequences Help in Deducing Relationships between Bacterial Species

Figure 2

We define signature sequences in terms of "conserved indels," representing either inserts or deletions of defined length and sequence at the same position in a particular protein (or its gene) found among all members of one or more phyla of bacteria but not in others. A signature sequence is considered useful if it is flanked on both sides by conserved regions, thus ensuring that the changes are not due to sequence misalignment or other errors (Fig. 2). Because indels are less likely to result from independent mutational events, they are more reliable than base or amino acid substitutions.

The simplest explanation for a shared indel or signature is that it was introduced only once in a common ancestor of the species or phyla—an assumption that is implicit in most other evolutionary analyses. Well-defined indels in genes and proteins also provide useful milestones for evolutionary events, since species emerging from ancestral cells in which the indel was introduced are expected to contain the signature, whereas other independently arising species would not. Thus, by identifying well-defined signatures that were introduced at various stages during evolution, we can deduce the branching order of different groups.

Figure 3

For instance, we recently identified a large number of conserved indels in different proteins that appear to have been introduced at specific stages of bacterial evolution (Fig. 3). By analyzing various bacterial species in which these indels either occur or are missing, we can distinguish all major groups within Bacteria and also can deduce where each of them branched from a common ancestor.

Testing the Indel Model for Bacterial Evolution Using Genomic Sequences

How can one objectively determine whether an evolutionary model based on indels is accurate? One potentially misleading problem is that indels, rather than being derived from a common ancestor, were introduced independently and on multiple occasions into different species within separate phyla. Moreover, an indel that arose in one species might transfer to others by lateral gene transfer (LGT). Recent analyses of genomic sequences suggests that LGT could be very common among prokaryotes, complicating efforts to determine evolutionary relationships.

Sequence data describing bacterial genomes provide a means for evaluating the indel model. According to this model, indels were introduced into particular proteins at specific stages during evolution. Also, according to the model, once an indel is introduced, all species derived from this ancestral species will also contain the indel, whereas species in other groups are expected not to contain that indel. However, if the indels were introduced either independently or the genes containing these indels were frequently laterally transferred, then the presence of indels will not follow any particular pattern. In such a case, indels will be found among different groups of species or even individual species from different groups. Thus, by comparing distributions of indels in species with predictions of those distributions, we can assess the reliability of this evolutionary model.

McMaster Biochemistry

Table 1

Researchers have determined the genomic sequences of at least 60 bacterial species. We evaluated these species in terms of signature indels associated with 18 bacterial proteins (Fig. 3) by means of BLAST analysis and sequence alignments (Table 1). The signature sequence alignments for most of these proteins—some containing, others lacking indels—can be found at my faculty website.

Many of the bacteria whose genomic sequences are determined contain proteins with these signatures, but they are lacking in a few species, of which the mycoplasmas and chlamydiae are most common. Presumably the corresponding genes were lost from these parasitic species when host cells began providing the missing function. For all 18 signatures (Table 1), the distribution of indels among various species corresponds to that predicted by the model, with only one exception out of 936 cases. The observed exception is the presence of indel in Thermotoga maritima in the Rho protein (Table 1), which according to the model should be lacking.

For all 18 of these bacterial proteins, information is also available for a large number of other species whose genomes have not been fully sequenced. In almost all cases, the distributions of these indels in these additional species follows the pattern predicted by the indel model. These results provide strong evidence that the phylogenetic placements and branching orders of different groups as deduced by indel analysis are highly reliable and internally consistent.

Potentially Confusing Lateral Gene Transfers Less of an Analytical Problem than Expected

These results also provide strong evidence that genes containing these indels, most of which encode for highly conserved housekeeping functions, are not affected by LGTs. Indeed, for LGTs to have a significant influence on these results, we would need to postulate highly specific LGT events, differing for each indel and involving all species belonging to a number of different groups of bacteria. Such a scenario seems highly improbable.

Likewise, the probability that the observed indels in various proteins could have occurred independently in different species can also be shown to be astronomically low. In order to understand and characterize LGT events, it is necessary to have a well-defined model indicating the proposed relationships between different groups. Without such a model, it is very difficult to identify and quantify LGT events. However, the indels in the protein sequences described here, which represent highly conserved molecular features, have enabled us to develop a detailed model describing how different major groups within Bacteria are related. This model should prove useful in interpreting signature sequences in other proteins as well as in identifying LGT events among bacterial groups.

Signature sequence analysis gives rise to a model of bacterial evolution that is satisfying on several counts:

  • The model is consistent with and accounts for major ultrastructural differences seen among Bacteria (see box, p. 285, and Fig. 1). For instance, gram-positive or monoderm bacteria that are surrounded by a single membrane are phylogenetically distinct from all true gram-negative or diderm bacteria that are surrounded by inner and outer membranes, which are themselves separated by a periplasmic compartment. Of these two distinct groups, the monoderm bacteria appear to be ancestral.

  • The model places Deinococcus-Thermus in an intermediate position between monoderm and diderm bacteria (Fig. 3), which is consistent with Deinococcus containing a thick peptidoglycan layer characteristic of gram-positive bacteria, staining gram positive, but also containing both inner and outer cell membranes.

  • According to the model, several bacterial lineages emerged in an anoxic atmosphere ahead of Cyanobacteria, whose appearance launched a large-scale release of oxygen into the atmosphere. This view is consistent with geological findings indicating that the earliest prokaryotic organisms emerged 3.5-3.8 billion years ago, well before the atmosphere became oxygenic 2.0-2.5 billion years ago.

Indel Data-Based Inferences Overcome Weaknesses of 16S rRNA Phylogenies

Phylogenetic inferences from these indel-based analyses correlate remarkably well with those based on rRNA trees (Fig. 3). Thus, most of the major groups within Bacteria that were identified on the basis of rRNA trees can now be defined in molecular terms. The main groups thus far identified on the basis of signature sequences represents a minimal number. Identifying additional signatures may also lead to subdividing some of these groups. For example, we expect signature sequence-based analysis to reveal new subdivisions among gram-positive bacteria and the Chlamydia-CFB-Green sulfur bacteria groups.

Of the 60 bacterial species whose genomes have been sequenced, almost all were assigned to the same groups by both methods. Signature sequences also clearly identify different groups corresponding to various protoebacterial subdivisions and indicate their relative branching order. We suggest that all of these proteobacterial groups should be accorded a similar phylogenetic rank as the other main phyla within Bacteria whose branching order can now be delineated (Fig. 3).

An important difference between the rRNA trees and the results obtained using the signature approach concerns the phylogenetic placement of Aquifex aeolicus. The signature analysis places this species between Proteobacteria (d,e) and the Chlamydia-CFB groups, whereas in rRNA trees, it is the deepest-branching species within Bacteria. The signature sequences also indicate distinct branching of the Clostridium group of species from other low G + C gram-positive bacteria, with Fusobacterium nucleatum related to them.

Lastly, the signature sequence approach has enabled us to determine the relative branching orders of the main groups within Bacteria, a vexing task that is central to understanding bacterial evolution but which has proven difficult to resolve in the past.

SUGGESTED READING

Doolittle, W. F. 1999. Phylogenetic classification and the universal tree. Science 284:2124-2128.

Gupta, R. S. 1998. Protein phylogenies and signature sequences: a reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. Microbiol. Mol. Biol. Rev. 62:1435-1491.

Gupta, R. S. 2000. The natural evolutionary relationships among prokaryotes. CRC Crit. Rev. Microbiol. 26:111-131.

Gupta, R. S. 2000. The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes. FEMS Microbiol. Rev. 24:367-402.

Gupta, R. S. 2001. The branching order and phylogenetic placements of species from completed bacterial genomes based on conserved indels found in various proteins. International Microbiol. 4:187-202.

Gupta, R. S., and E. Griffiths. 2002. Critical issues in bacterial phylogenies. Theoret. Population Biol., in press.

Ludwig, W., and H.-P. Klenk. 2001. Overview: a phylogenetic backbone and taxonomic framework for prokaryotic systematics, p. 49-65. In D. R. Boone and R. W. Castenholz (ed.), Bergey's manual of systematic bacteriology, vol. 1, second edition. Springer-Verlag, Berlin.

Ludwig, W., and K. H. Schleifer. 1999. Phylogeny of Bacteria beyond the 16S rRNA standard. ASM News 65:752-757.

Lyons, S. 2002. Thomas Kuhn is alive and well: the evolutionary relationships of simple life forms, a paradigm under siege. Perspect. Biol. Med., in press.

Stanier, R. Y. 1976. The microbial world. Prentice Hall, Inc. Englewood Cliffs, N.J.

Woese, C. R. 1987. Bacterial evolution. Microbiol. Rev. 51:221-271.

Woese C. R. 1998. The universal ancestor. Proc. Natl. Acad. Sci. USA 95:6854-6859.

Last Modified:June 18, 2002
Email: webmaster@asmusa.org
Copyright © 2002 American Society for Microbiology All rights reserved ASM
HomeSite Map Search ASM Site