Microbial Genome Sequencing
Microbial
Genomes: a Blueprint for Life [PDF]
The American Academy of Microbiology (AAM) recently released a report
on microbial genome sequencing projects. This summary of a colloquium
convened by the AAM 19-21 March 2000 in New Orleans, La., raises a
number of important issues related to the recent developments in
microbial genome sequencing. The conclusions encourage ASM to "play
a proactive role in keeping the microbial genome sequencing endeavor
vigorous and productive." The outline and overall direction of the
report are laudable; however, before ASM accepts all the
recommendations, I think the data release policies promoted by the
authors deserve careful consideration.
MICROBIAL GENOME SEQUENCING PROJECTS
In several places the report proposes allowing researchers to
withhold genomic sequence data from public purview. The conclusion that
"data should be released either at the time of publication or when
the grant that funded the sequencing expires" implies that sequence
data may be justifiably withheld from public examination and use for
essentially as long as the principle investigator desires. This advice
contradicts the guidelines released from the NIH that microbial genomes
should be released within one month of reaching threefold coverage. In
practice, the majority of publicly funded genome sequencing projects
(with a few notable exceptions) opt for immediate or near-immediate data
release policies.
Suggesting that the experimenter withhold the sequence data until
they have finished their analysis suggests that the only important
question about that genome is the one that the sequencer is asking.
Furthermore, involvement in a genome sequencing project has been
sufficient scientific benefit to encourage many productive
collaborations. Experience has already shown that a wide variety of
questions can be answered, once a sequence is publicly availablemore
than a single researcher could answer in a lifetime. Furthermore, far
from limiting the ability of the individual researcher to analyze the
genome, the rapid and frequent dissemination of data results in
discussions between interested parties that ultimately benefit all
involved. Thus, the benefits of widespread dissemination outweigh the
marginal costs.
In spite of NIH guidelines, some project leaders have chosen to
retain all data from sequencing projects, or allow only BLAST searches
which render novel portions of the data inaccessible. This results in
the sequencers becoming divorced from both related projects and the
biology behind their sequencing. Retaining sequencing information
"until publication" also allows sequencers to favor some
researchers. This provides an unfair advantage to "friends" of
the researcher over other members of the scientific community, a much
more egregious assault on scientific integrity and ethics than using
public sequence data with appropriate attribution. The biology must
drive the sequencing and not vice versa.
TIGR
Microbial Genomes Blast
Databases
The report also criticizes the lack of central organization of
genomic sequencing, claiming it leads to redundancy. However,
duplication of sequencing effort has been fostered by delaying sequence
data release. Other groups that are considering sequencing related
organisms and do not know the status of "private" programs may
proceed unilaterally, resulting in duplicated effort. Many biologists
would rather duplicate a sequencing project if it ensured public data
release instead of waiting for the sequence to be released
island-by-island or never at all. Similarly, the report calls for a
central repository for genomic sequence and mentions the TIGR website as
a leading resource. However, the report completely ignores the critical
role of the NIH microbial genomes page in the dissemination of bacterial
genome sequencing data. The primary function of this site is to provide
a single point of access to data generated from the diverse microbial
sequencing projects regardless of funding source. Secondary functions
include informing others about current sequencing projects (to reduce
redundancy) and providing links to information centers. Currently data
release to this site is voluntary and incomplete. For example, many
state- or privately funded projects are included, while some key NIH-funded
projects are not included. Future funding should demand frequent data
release via this publicly funded national repository.
The joint Clinton-Blair statement mentioned in the report states that
[human] sequence data should be released "for the benefit of
researchers in every corner of the globe." This policy is required
for all projects funded by the National Human Genome Research Initiative
and should be required of all publicly funded genomics projects. By
promoting frequent and public data release, ASM will demonstrate
appropriate public-spirited leadership. ASM needs to review the
recommendations of the AAM report, and consider whether a closed data
release policy is really an appropriate use of taxpayers' money given
the potential benefits to society. The only way ASM can play a
"proactive role in keeping the microbial genome sequencing endeavor
vigorous and productive" is to promote an open and instant data
release policy that benefits all researchers in every corner of the
globe.
Robert Edwards
University of Tennessee Health Sciences Center, Memphis
redwards@utmem.edu
Salmonella.Org
P.S. The author is a collaborator on several Salmonella sequencing
projects. More information on these projects and all the sequence data
can be retrieved from salmonella.org.