Phylogeny of Bacteria: Are We Now
Close to Understanding It?
Conserved signature sequences in proteins provide a
means for assigning bacteria to phylogenetic groups and determining
branching order
Radhey S. Gupta
Understanding evolutionary relationships among
prokaryotes constitutes one of the most fundamental challenges in
biology. Because prokaryotes were the sole inhabitants for nearly the
first 2 billion years of life on earth, they are key to understanding
fundamental questions about the nature and origin of the first cell,
metabolism, photosynthesis, information transfer processes, and
eukaryotic cells. Earlier efforts based on morphological, biochemical,
and physiological characteristics met with limited success in describing
how various groups of prokaryotes are evolutionary related. However,
using genomic sequences to deduce evolutionary relationships provides
new hope for discerning true genealogical relationships among
prokaryotes.
During the past 25 years, considerable attention focused
on the nature of the highest taxa within prokaryotes. For instance, Carl
Woese and coworkers at the University of Illinois, Urbana, extensively
analyzed 16S rRNA sequences and suggested that prokaryotes divided into
two primary taxa (domains), Archaea and Bacteria,
originating from a universal ancestor. These taxa are distinguished on
the basis of several biochemical and sequence features, as well as many
gene and protein phylogenies.
Figure 1
Nonetheless, Archaea and gram-positive bacteria
are closely related in terms of cell structure, genomic organization
(Fig. 1), and in many other gene-protein phylogenies. Whether the Archaea
are truly distinct from other Bacteria, or are relatives of
gram-positive bacteria that arose from them in response to prolonged
antibiotic selection pressure, is an important question to debate, and
one with important implications for microbiology. However, because the
relationship between Archaea and Bacteria constitutes only
a small part of prokaryotic phylogeny, there are many other important
evolutionary relationships to discern among the members of Bacteria,
which comprise the vast majority of the prokaryotes.
Bacterial Phylogeny: Current Understanding and Key
Unresolved Questions
According to 16S rRNA analyses, bacteria are divided
into 23 main phyla, which include: Aquificae, Thermotogae,
Thermodesulfobacteria, Deinococcus-Thermus, Chrysiogenetes,
Chlorflexi, Thermomicrobia, Nitrospirae, Deferribacteres,
Cyanobacteria, Chlorobi, Firmicutes (low G+C gram
positive), Actinobacteria (high G+C gram positive), Bacteroidetes,
Planctomycetes, Verrucomicrobia, Chlamydiae, Spirochaetes,
Fibrobacteres, Acidobacteriae, Fusobacteria, Dictyoglomi,
and Proteobacteria. The criteria used to identify members of
these phyla are ill-defined, and the overall picture is unsatisfactory.
For instance, several of these phyla consist of only one or a few
species (e.g., Thermodesulfobacteria, Thermomicrobia, Chrysiogenetes,
Fibrobacteres, and Deferribacteres), whereas others (e.g.,
Proteobacteria, Cyanobacteria, and the low- and high-G+C
gram-positive groups) contain several hundred to thousands of species
accounting for more than 90-95% of the known bacteria.
All main phyla within Bacteria are recognized on
the basis of their branching in the rRNA trees with no other markers
with which to distinguish species belonging to separate phyla. When many
of the major bacterial phyla were originally described, they were
clearly distinguishable from each other in the 16S rRNA trees based on
the long branches that separated them. However, this seeming clarity
resulted mainly from limited sequence data, a perspective that has since
vanished amid rapidly expanding sequence data, according to Wolfgang
Ludwig of the University of Munich in Germany and Hans-Peter Klenk at
Epidauros Biotechnology, Inc. Germany. Hence, assigning species to
various groups based solely on branching positions in rRNA trees is
imprecise.
A critical issue in bacterial phylogeny is the relative
branching orders of the main phyla within Bacteria. Some
scientists assume that the failure of 16S rRNA-based (or other,
protein-based) trees to resolve these relationships means that such
relationships can never be deduced. Thus, increasing numbers of
scientists accept that all main groups within Bacteria branched
off directly from a common ancestor, precluding an understanding of the
relationship between different bacterial phyla.
The two major challenges to understanding evolutionary
relationships within Bacteria that we currently face are (i)
developing molecular criteria by which all of the main phyla within Bacteria
can be clearly defined and (ii) understanding how the different
phyla branched from a common ancestor and how they are related.
Understanding these issues will require new approaches to analyzing
molecular sequence data. One approach that shows great promise involves
use of conserved inserts and deletionssignature sequencesfound in
various proteins.
Signature Sequences Help in Deducing Relationships
between Bacterial Species
Figure 2
We define signature sequences in terms of
"conserved indels," representing either inserts or deletions
of defined length and sequence at the same position in a particular
protein (or its gene) found among all members of one or more phyla of
bacteria but not in others. A signature sequence is considered useful if
it is flanked on both sides by conserved regions, thus ensuring that the
changes are not due to sequence misalignment or other errors (Fig. 2).
Because indels are less likely to result from independent mutational
events, they are more reliable than base or amino acid substitutions.
The simplest explanation for a shared indel or signature
is that it was introduced only once in a common ancestor of the species
or phylaan assumption that is implicit in most other evolutionary
analyses. Well-defined indels in genes and proteins also provide useful
milestones for evolutionary events, since species emerging from
ancestral cells in which the indel was introduced are expected to
contain the signature, whereas other independently arising species would
not. Thus, by identifying well-defined signatures that were introduced
at various stages during evolution, we can deduce the branching order of
different groups.
Figure 3
For instance, we recently identified a large number of
conserved indels in different proteins that appear to have been
introduced at specific stages of bacterial evolution (Fig. 3). By
analyzing various bacterial species in which these indels either occur
or are missing, we can distinguish all major groups within Bacteria and
also can deduce where each of them branched from a common ancestor.
Testing the Indel Model for Bacterial Evolution Using
Genomic Sequences
How can one objectively determine whether an
evolutionary model based on indels is accurate? One potentially
misleading problem is that indels, rather than being derived from a
common ancestor, were introduced independently and on multiple occasions
into different species within separate phyla. Moreover, an indel that
arose in one species might transfer to others by lateral gene transfer (LGT).
Recent analyses of genomic sequences suggests that LGT could be very
common among prokaryotes, complicating efforts to determine evolutionary
relationships.
Sequence data describing bacterial genomes provide a
means for evaluating the indel model. According to this model, indels
were introduced into particular proteins at specific stages during
evolution. Also, according to the model, once an indel is introduced,
all species derived from this ancestral species will also contain the
indel, whereas species in other groups are expected not to contain that
indel. However, if the indels were introduced either independently or
the genes containing these indels were frequently laterally transferred,
then the presence of indels will not follow any particular pattern. In
such a case, indels will be found among different groups of species or
even individual species from different groups. Thus, by comparing
distributions of indels in species with predictions of those
distributions, we can assess the reliability of this evolutionary model.
McMaster
Biochemistry
Table 1
Researchers have determined the genomic sequences of at
least 60 bacterial species. We evaluated these species in terms of
signature indels associated with 18 bacterial proteins (Fig. 3) by means
of BLAST analysis and sequence alignments (Table 1). The signature
sequence alignments for most of these proteinssome containing, others
lacking indelscan be found at my faculty website.
Many of the bacteria whose genomic sequences are
determined contain proteins with these signatures, but they are lacking
in a few species, of which the mycoplasmas and chlamydiae are most
common. Presumably the corresponding genes were lost from these
parasitic species when host cells began providing the missing function.
For all 18 signatures (Table 1), the distribution of indels among
various species corresponds to that predicted by the model, with only
one exception out of 936 cases. The observed exception is the presence
of indel in Thermotoga maritima in the Rho protein (Table 1),
which according to the model should be lacking.
For all 18 of these bacterial proteins, information is
also available for a large number of other species whose genomes have
not been fully sequenced. In almost all cases, the distributions of
these indels in these additional species follows the pattern predicted
by the indel model. These results provide strong evidence that the
phylogenetic placements and branching orders of different groups as
deduced by indel analysis are highly reliable and internally consistent.
Potentially Confusing Lateral Gene Transfers Less of
an Analytical Problem than Expected
These results also provide strong evidence that genes
containing these indels, most of which encode for highly conserved
housekeeping functions, are not affected by LGTs. Indeed, for LGTs to
have a significant influence on these results, we would need to
postulate highly specific LGT events, differing for each indel and
involving all species belonging to a number of different groups of
bacteria. Such a scenario seems highly improbable.
Likewise, the probability that the observed indels in
various proteins could have occurred independently in different species
can also be shown to be astronomically low. In order to understand and
characterize LGT events, it is necessary to have a well-defined model
indicating the proposed relationships between different groups. Without
such a model, it is very difficult to identify and quantify LGT events.
However, the indels in the protein sequences described here, which
represent highly conserved molecular features, have enabled us to
develop a detailed model describing how different major groups within Bacteria
are related. This model should prove useful in interpreting
signature sequences in other proteins as well as in identifying LGT
events among bacterial groups.
Signature sequence analysis gives rise to a model of
bacterial evolution that is satisfying on several counts:
-
The model is consistent with and accounts for
major ultrastructural differences seen among Bacteria (see box,
p. 285, and Fig. 1). For instance, gram-positive or monoderm bacteria
that are surrounded by a single membrane are phylogenetically distinct
from all true gram-negative or diderm bacteria that are surrounded by
inner and outer membranes, which are themselves separated by a
periplasmic compartment. Of these two distinct groups, the monoderm
bacteria appear to be ancestral.
-
The model places Deinococcus-Thermus in
an intermediate position between monoderm and diderm bacteria (Fig.
3), which is consistent with Deinococcus containing a thick
peptidoglycan layer characteristic of gram-positive bacteria, staining
gram positive, but also containing both inner and outer cell
membranes.
-
According to the model, several bacterial
lineages emerged in an anoxic atmosphere ahead of Cyanobacteria,
whose appearance launched a large-scale release of oxygen into the
atmosphere. This view is consistent with geological findings
indicating that the earliest prokaryotic organisms emerged 3.5-3.8
billion years ago, well before the atmosphere became oxygenic 2.0-2.5
billion years ago.
Indel Data-Based Inferences Overcome Weaknesses of
16S rRNA Phylogenies
Phylogenetic inferences from these indel-based analyses
correlate remarkably well with those based on rRNA trees (Fig. 3). Thus,
most of the major groups within Bacteria that were identified on
the basis of rRNA trees can now be defined in molecular terms. The main
groups thus far identified on the basis of signature sequences
represents a minimal number. Identifying additional signatures may also
lead to subdividing some of these groups. For example, we expect
signature sequence-based analysis to reveal new subdivisions among
gram-positive bacteria and the Chlamydia-CFB-Green sulfur
bacteria groups.
Of the 60 bacterial species whose genomes have been
sequenced, almost all were assigned to the same groups by both methods.
Signature sequences also clearly identify different groups corresponding
to various protoebacterial subdivisions and indicate their relative
branching order. We suggest that all of these proteobacterial groups
should be accorded a similar phylogenetic rank as the other main phyla
within Bacteria whose branching order can now be delineated (Fig.
3).
An important difference between the rRNA trees and the
results obtained using the signature approach concerns the phylogenetic
placement of Aquifex aeolicus. The signature analysis places this
species between Proteobacteria (d,e) and the Chlamydia-CFB
groups, whereas in rRNA trees, it is the deepest-branching species
within Bacteria. The signature sequences also indicate distinct
branching of the Clostridium group of species from other low G + C
gram-positive bacteria, with Fusobacterium nucleatum related to
them.
Lastly, the signature sequence approach has enabled us
to determine the relative branching orders of the main groups within Bacteria,
a vexing task that is central to understanding bacterial evolution but
which has proven difficult to resolve in the past.
SUGGESTED READING
Doolittle, W. F. 1999.
Phylogenetic classification and the universal tree. Science 284:2124-2128.
Gupta, R. S. 1998.
Protein phylogenies and signature sequences: a reappraisal of
evolutionary relationships among archaebacteria, eubacteria, and
eukaryotes. Microbiol. Mol. Biol. Rev. 62:1435-1491.
Gupta, R. S. 2000.
The natural evolutionary relationships among prokaryotes. CRC Crit. Rev.
Microbiol. 26:111-131.
Gupta, R. S. 2000.
The phylogeny of proteobacteria: relationships to other eubacterial
phyla and eukaryotes. FEMS Microbiol. Rev. 24:367-402.
Gupta, R. S. 2001.
The branching order and phylogenetic placements of species from
completed bacterial genomes based on conserved indels found in various
proteins. International Microbiol. 4:187-202.
Gupta, R. S., and E. Griffiths. 2002.
Critical issues in bacterial phylogenies. Theoret. Population Biol., in
press.
Ludwig, W., and H.-P. Klenk. 2001.
Overview: a phylogenetic backbone and taxonomic framework for
prokaryotic systematics, p. 49-65. In D. R. Boone and R. W.
Castenholz (ed.), Bergey's manual of systematic bacteriology, vol. 1,
second edition. Springer-Verlag, Berlin.
Ludwig, W., and K. H. Schleifer.
1999. Phylogeny of Bacteria beyond the 16S rRNA standard. ASM News 65:752-757.
Lyons, S. 2002.
Thomas Kuhn is alive and well: the evolutionary relationships of simple
life forms, a paradigm under siege. Perspect. Biol. Med., in press.
Stanier, R. Y. 1976.
The microbial world. Prentice Hall, Inc. Englewood Cliffs, N.J.
Woese, C. R. 1987. Bacterial
evolution. Microbiol. Rev. 51:221-271.
Woese C. R. 1998. The universal
ancestor. Proc. Natl. Acad. Sci. USA 95:6854-6859.