ASM News
ASM Home Site Map Search ASM Site

    !animalc.gif (410 bytes)

Karen E. Nelson is Assistant Investigator, Ian T. Paulsen is Assistant Investigator, and Claire M. Fraser is President at The Institute for Genomic Research, Rockville, Md.  

 

Microbial Genome Sequencing: a Window into Evolution and Physiology 

Surprises, such as the extent of lateral gene transfers, could be overlooked if microbial genome sequencers opt not to complete their analyses 

Karen E. Nelson, Ian T. Paulsen, and Claire M. Fraser

Who would have thought 20 years ago that we would be publishing completed microbial genomes at the current rate? Who would have thought that we would have this approach to analyzing microbes, and that it would reveal so quickly how little we really know about them.

TIGR Microbial Database

The Institute for Genomic Research (TIGR) published the first complete microbial genomic sequence, that of Haemophilus influenzae, in 1995. Since then, 42 microbial genomes have been sequenced and published (see cover), and more than 100 are known to be in progress. These projects have yielded sequence information describing the genomes of a growing number of major human bacterial pathogens, including Mycobacterium tuberculosis, Neisseria meningitidis, Pseudomonas aeruginosa, and various Chlamydia species. 

 In addition, sequence analysis for other important microorganisms such as Escherichia coli and Bacillus subtilis have been completed, as have those for many environmental species, including Thermotoga maritima, Deinococcus radiodurans, and Halobacterium sp. More recently, genomes of agricultural significance, such as Xylella fastidiosa, Pasteurella multocida, and Buchnera sp., have been completed, and a variety of other plant and animal pathogens, and rumen bacteria, are currently being sequenced. 

Genomic Sequencing Analysis Provides a Range of Insights 

Random shotgun sequencing has been the dominant method for generating genome sequences, having proven to be as useful as and more efficient than traditional methods such as “walking” step-by-step along bacterial artificial chromosomes (BACs). The random shotgun approach has been used to sequence genomes of microorganisms with variations in size, base composition, repeat elements, insertion sequence (IS) elements, and multiple chromosomal molecules and plasmids.

Microbial Genomics Reuniting with Microbiology

Figure 1

In the random shotgun method, total DNA of the organism of choice is isolated and used to construct multiple-sized insert libraries, clones of which are sequenced to provide approximately eightfold coverage of the genome. Once sequencing is completed, the data are assembled into segments called contigs; they are then ordered and grouped into ever-longer assemblages of such segments; other sequencing is done to bridge any physical gaps that may remain between those assembled contigs; and then the compiled sequence is edited, annotated, and published, while appropriate websites are constructed (Fig. 1). Overall, DNA sequencing approaches continue to improve, increasing the efficiency and speed with which each genome project is conducted (see Nelson et al., Nature Biotechnol. 18:1049-1054, 2000, and ASM News, May 2001, p. 247). 

Once genomic sequencing and assembly are completed, bioinformatic analyses, which are essential for interpreting and understanding such data, can begin in earnest. Several kinds of analytic approaches are followed and, typically, they include identifying all open reading frames (ORFs), transfer RNAs, ribosomal RNAs, and repetitive sequences, and then applying one or several gene prediction programs, such as the Hidden Markov models or Interpolated Markov models, which have proved effective in identifying microbial genes. 

Figure 2

At this stage, many investigators use automated annotation methods, supplemented with “hand” curation to assign biological names and functions to as many of the predicted genes as possible. Names are assigned to those presumptive genes usually based on traditional methods such as BLAST or FASTA searches against sequence databases. Annotation also involves identifying novel genomic features such as nucleotide biases, origins of replication, putative regions of gene transfer, repeat structures, insertion elements, and plasmids. More detailed analyses of the genomic sequence, including metabolic reconstructions (Fig. 2), can provide a full or partially integrated view of cellular physiology, thus furnishing a more comprehensive description of the biology of the species. 

Genomics Especially Powerful for Comparative Microbiology 

By now the genomes of a diverse array of microorganisms have been sequenced, and these data sets are proving highly valuable for conducting detailed comparative studies, inclusive of genomic composition, gene organization, and gene families within and across major domains of microbial organisms, and systematic comparative analysis of representative organisms from different phylogenetic lineages. Such studies are helping to illustrate the role played by lateral gene transfers(LGT)among microorganisms as they adapt to varied environmental conditions. 

For example, consider Thermotoga maritima MSB8, a hyperthermophilic bacterium that was isolated from Vulcano, Italy, in 1986, by Karl Stetter of the University of Regensburg in Regensburg, Germany, and published by TIGR in 1999. Comparative genomics indicates there has been extensive LGT between this bacterium and members of the archaeal domain—in particular, Archaeoglobus fulgidus and Pyrococcus furiosus.

We estimated the extent of LGT in T. maritima by combining several bioinformatic approaches, including similarity searches, phylogenetic analyses, reviewing its atypical genome composition, and comparing gene order within T. maritima to that of other archaeal species with similar genomic regions. Based on these indices, we estimate LGT accounts for 24% of the genes within the T. maritima genome. Independent analyses, including periodicity analysis of the DNA, subtractive hybridization studies, and phylogenetic reconstruction based on these archaeal-like genes, lend additional support to these qualitative conclusions. Moreover, extensive LGT is also observed in Aquifex aeolicus and Chlorobium tepidum. 

Many reports document single (or even several) gene exchanges within and between microorganisms in the bacterial and archaeal domains, between organellar and nuclear genomes, and by plasmids across different microbial species. However, the T. maritima genomic sequence analysis highlights the potential for LGT among microorganisms within the same or similar environments.

While genomics-based and other studies expand and deepen our understanding of evolution and microbial phylogenetics, these insights are leading to a growing awareness that phylogenetic trees do not seem to represent the best way to depict relationships among organisms. If anything, net-like patterns connecting individual microorganisms may provide a better means for reflecting their relatedness and the relative frequency and perhaps the biological importance of horizontal gene transfers among them.

Comparable genome-wide analyses of other sets of microorganisms reveal additional insights about genome plasticity. Pyrococcus species, for instance, appear to have experienced large-scale rearrangements—at least since the three such Pyrococcus species whose genomes are now sequenced diverged from a common ancestor. Similar rearrangements are seen among closely related Chlamydia sp. Meanwhile, Jocelyn DiRuggiero, currently at The University of Maryland at College Park, and her colleagues described an apparently recent LGT event between two genera of hyperthermophilic archaea, P. furiosus and Thermococcus litoralis. Both contain a nearly identical 16-kb DNA fragment flanked by insertion (IS) elements, suggesting this fragment was passed from one to the other of this pair recently, with not enough time for its sequence to have drifted or otherwise changed.

Tranport Protein Analysis

Figure 3

Other kinds of comparative genomic studies are under way. One entails comparing all the identified membrane transporter genes among those bacteria whose genomes are sequenced. This analysis shows that global transport specificities correlate closely with the lifestyle of each organism, reflecting the concentration and diversity of nutrients available in their particular ecological niche (Fig. 3).

For example, intracellular parasites such as the chlamydias and Rickettsia prowazekii have an extensive set of transporters for importing amino acids and nucleotides but few that enable the uptake of free sugars. This preference for particular types of transporters almost certainly reflects the relative abundance of these types of compounds that these microbes encounter while in an intracellular environment.

Moreover, the energy coupling mechanism used to drive transport across any particular microbial membrane tends to match the overall mode that each organism uses in generating its own metabolic energy. For example, organisms such as the mycoplasmas and spirochetes, which lack a TCA cycle and an electron transfer chain and, hence, rely on substrate-level phosphorylation to generate a proton motive force, depend mainly on ATP-dependent rather than proton-dependent transporters. The converse is true of organisms such as E. coli that tend to be more metabolically versatile. Thus, such comparative analyses of transporters provide insights into both the physiology of the organism and the environment in which it dwells.

Specialized Approaches Help To Meet Genomics Data “Crunching” Demands 

TIGR - Comprehensive Microbial Resource

With a growing need to analyze vast quantities of sequence data, investigators are developing and spinning off specialized data analysis sites that allow users to query information related to their organisms of choice. For instance, Owen White and his colleagues at TIGR recently constructed the Comprehensive Microbial Resource as part of an effort to reduce annotation inconsistencies across completed genomes. At this site, users may access data from a wide and steadily growing range of sequenced genomes, with annotation available from both the center that sequenced that genome and from TIGR, which conducts an additional automated annotation. This database allows the user to construct complex queries based on role assignments, database matches, protein families, membrane topology, and other features.

EcoCyc and MetaCyc

University of Minnesota Biocatalysis/Biodegradation Database

Another site, EcoCyc and MetaCyc, allows for metabolic reconstruction of microbial genomes. Yet another site, developed and managed as the University of Minnesota Biocatalysis/Biodegradation Database, contains close to 100 pathways for microbial catabolic metabolism of mainly xenobiotic organic compounds. The HOBACGEN database contains all available protein-encoding genes from bacteria, archaea, and yeast classified into families, and includes multiple alignments and phylogenetic trees built from these families.

There are limitations to bioinformatic approaches in terms of being able to make predictions of the true identities of the large portion of the genome composed of unknown genes and conserved hypothetical proteins, as well as regulatory networks in these organisms. New developments however, are enabling researchers to delve ever more deeply into the basic biology and genetics of sequenced species. For instance, DNA microarrays allow investigators to measure expression patterns of thousands of genes in parallel. Probing with fluorescently labeled mRNA isolated from cells grown under different conditions allows investigators to determine gene expression levels, and probing with DNA from different strains or isolates allows them to detect environmental variability. Arrays also provide a reliable means for identifying genes associated with particular metabolic pathways.

The ability to monitor mRNA levels is providing valuable insights into a number of sequenced species. For instance, differential transcription profiles of genes being expressed from within the E. coli genome have been described for growth in different media and at exponential and transitional phases, thereby providing insight into the physiological differences between cell populations under these conditions. Microarray technology is now sensitive enough to study dynamic processes such as DNA replication at high resolution. Analysis with such microarrays significantly reduces an investigator's dependence on conducting extensive biochemical analyses to identify functions of unknown genes, a prospect that would be tedious and perhaps insurmountable in the face of the huge quantities of sequence data now being produced. 

Complementary to microarray analyses are proteomic studies using two-dimensional gel electrophoresis to examine the proteins a particular cell produces and where within cells those proteins may localize. In addition, matrix-assisted desorption/ionization-time of flight mass spectrometry provides a highly sensitive way of conducting high throughput screening of protein samples derived from two-dimensional gel electrophoresis. Meanwhile, two-hybrid analytical systems are being used to determine all, or as many as possible, of the protein-protein interactions in selected microorganisms whose genomes are sequenced. 

New Analytic Tools Enable Investigators To Address Basic Biological Questions 

The availability of microbial genome sequences also enables investigators to design and conduct large-scale mutagenesis projects for the purpose of examining gene function on a genome-wide scale. For instance, using transposon mutagenesis, a team led by Clyde Hutchison and Scott Peterson at TIGR produced a large set of gene knockouts within Mycoplasma genitalium and Mycoplasma pneumoniae and used these mutants for the identification of nonessential genes when these organisms are grown under laboratory conditions. This approach provides an estimate of the minimal genome required for life, indicating some 265-330 of the genes in M. genitalium are essential for growth in a nutrient-rich broth.

Complete genomes have also increased our abilities to address other biological questions. For instance, investigators interested in bioremediation and environmental engineering now can draw on information from the sequences of a range of biochemically versatile microorganisms, including Dehalococcoides ethenogenes, which efficiently degrades tetrachloroethene anaerobically; Deinococcus radiodurans, one of the most radiation resistant organisms known; Pseudomonas putida, a widely distributed organism that is metabolically adept at degrading a range of organic compounds; and Thermotoga maritima, with numerous pathways for degradating complex plant polymers such as xylan and cellulose. 

Relative Values of Complete versus Incomplete Genome Sequences

We believe that closed and completed genome sequences are much more valuable to the scientific community than are partially sequenced genomes. While the latter can serve a useful purpose for those seeking to identify novel genes, the former are essential for understanding properties such as genome structure, repeat elements, and lateral gene transfer—and, hence, provide a better framework for undertaking functional genomic studies. For instance, had we only sequenced T. maritima to eightfold coverage, we would not have observed the clustering of archaeal-like genes on the T. maritima genome, would not have spotted the conserved gene order between T. maritima and archaea, and may also have failed to appreciate the extent and significance of lateral gene transfer. 

Completed sequences are also essential for comparative genomic purposes, as it is not possible to draw conclusions about the presence or absence of genes in partially sequenced genomes. For instance, analysis of eightfold sequence coverage compared with the completed sequence of T. maritima suggests that we would have wholly or partly missed approximately 100 genes as well as the complete ribosomal RNA operon had we not closed this genome. Finally, it is far more time- and cost-effective to close a genome in one process than to subsequently attempt in an ad hoc manner to close gaps in a genome sequenced to only eightfold or lower coverage. 

Microbial genome sequencing continues to develop as a distinctive subdiscipline, and each new organism that is analyzed reveals new features about genome organization, gene regulation, gene content, and the biochemical potential embedded within the microbial world. We see a continuing scientific need to expand genomic sequencing efforts to include a wider array of organisms for biodiversity and evolutionary purposes. The large number of unknown and conserved hypothetical proteins that remain to be characterized illustrate our current limited understanding of microbial organisms. 

SUGGESTED READING 

Diruggiero, J., et al. 2000. Evidence of recent lateral gene transfer among hyperthermophilic archaea. Mol. Microbiol. 38:684-693.

Frangeul, L., K.E. Nelson, C. Buchrieser, A. Danchin, P. Glaser, and F. Kunst. 1999. Cloning and assembly strategies in bacterial genome projects. Microbiology 145:2625-2634.

Hutchison, C.A., et al. 1999. Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286:2165-2169.

Nelson, K.E., I. T. Paulsen, J. Heidelberg, and C. M. Fraser. 2000. Status of genome projects for nonpathogenic bacteria and archaea. Nature Biotechnol. 18:1049-1054.

Nelson, K. E. 1999. Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima. Nature 399:323-329. 

Paulsen, I. T., L. Nguyen, M. K. Sliwinski, R. Rabus, and M. H. Saier, Jr. 2000. Microbial genome analyses: comparative transport capabilities in eighteen prokaryotes. J. Mol. Biol. 301:75-101. 

Last Modified:June 13, 2001
Email: webmaster@asmusa.org
Copyright © 2001 American Society for Microbiology All rights reserved ASM
HomeSite Map Search ASM Site