Persisting Viruses Could Play Role in
Driving Host Evolution
Genetic parasites may be the legacy of punctuated
events during evolution, including eukaryotic replication and immune
Luis P. Villarreal
When a persistent virus or its defective counterpart
colonizes a host, it can have major consequences, perhaps enabling the
host to rapidly acquire a complex phenotype. This issue of complex
phenotype acquisition, which requires coordinated expression of several
complementing or interacting genes, raises challenging questions for
evolutionary biologists to address. Such events seem to mark major
breaks and new orders, such as development of the eukaryotic nucleus or
of the adaptive immune system. Yet, these punctuated acquisitions would
not appear likely or even feasible based on conventional neo-Darwinian
models for evolution that emphasize point mutations and sexual
recombination of host genes to create novel phenotypes.
However, these models ignore well-established microbial
phenomena. For instance, in a single step, a bacterium can acquire a
full set of genes conferring virulence or multiple drug resistance.
Although such events may involve horizontal transfers of genes by
transposons, they also may be mediated by phage following lysogenic
infectionsfor example, prophage V cholera, which actually involves
two distinct phages. Moreover, some apparent transpositions closely
resemble transfers mediated by defective viruses and could well derive
from phage-mediated transfers, according to Allan M. Campbell, Stanford
University, Calif. Indeed, by comparing sequence data for the Escherichia
coli and Bacillus subtilis genomes, we find that they differ
in about 230 regions and that these differences are mostly adjacent to
tRNA sites, suggesting sites of prophage integration instead of
Phage Appear To Mediate Bacterial Evolution
Thus bacteria, the most adaptable of organisms, evolve
in part by infectious, phage-mediated processes that involve persisting
parasitic genomes occupying host enomes and introducing novel
phenotypes. Although many microbiologists continue to call this
"horizontal transmission," implying that viruses serve merely
as vehicles for moving genes between two cellular hosts, I argue that
bacteria derive many, if not most, of their novel gene sets from viral,
not other, host sources. Given the higher relative rates of viral
variation, recombination, and adaptation, viruses rather than bacteria
can massively explore sequence space.
For example, T4 phage has 225 genes, only 69 of which
are essential for growth in E. coli and only 42 of which are
similar to genes in GenBank. The majority of these GenBank-related genes
are curiously more related to eukaryotic genes, including the
self-splicing group I introns (discovered in the T4 TK gene). Other
viruses are similar in their degree of novelty while lacking substantial
similarity to host genes, including most eukaryotic DNA viruses. Thus,
the large majority of phage and viral genes are unique to various
families of virus, not hosts, although some striking similarities to
eukaryotic genes exist.
To persistently occupy its host, a virus needs to be
very competitive, and this often entails very specific components. For
example, the RIIA gene of T4, used to elucidate the most basic molecular
aspects of genes, has no required function within its host, E. coli.
Instead, this viral gene is needed solely to enable T4 to infect an E.
coli cell that already is occupied by a lysogen. In fact, other
T-even phages carry highly conserved, early genes of unknown function
that apparently are not necessary for viral replication. Furthermore,
these genes generally have no host analogs, suggesting that they were
not "stolen" from host genomes as some phage experts assert.
The Origin of Eukaryotes
If bacteria evolved by means of infectious mechanisms,
were similar mechanisms at work for eukaryotes? Are we overlooking
examples of infection- or transduction-mediated genetic adaptation?
Could persisting parasitic genetic agents somehow be involved in
eukaryotic evolution? Or do complex structures and physiologic processes
arise in eukaryotes solely by means of point mutations, sexual
exchanges, and recombination?
Many metazoan organisms carry a load of parasitic
genetic elementsoften called junk DNAthat presumably accumulated
steadily during evolution. Comparing E. coli with humans, we see
only a surprisingly modest increase in gene number from 2,350 to about
40,000. We also see gene density drop from 90% coding sequence within
the genome of E. coli to less then 2% within the human genome,
which is replete with noncoding, parasitic elements, including type I
and II transposons, and other distinctive parasitic elements, such as
long and short interspersed elements (LINEs and SINEs). The genomes of
humans and other vertebrates also contain several apparently intact
For example, human chromosome 21 carries 225
protein-encoding genes, but also carries 2,000 endogenous retroviral
elements. In addition, investigators participating in the human genome
project identified 113 examples of human genes that are not found in
several other simpler eukaryotes, but can be found in bacterial specieswhich
is consistent with direct horizontal transfers of genes between bacteria
The biggest discontinuity in evolutionbetween
bacteria and eukaryotesis reflected in differences between their
respective replication proteins. Both bacteria and eukaryotes use
specialized sets of proteins to replicate their DNA. Although these
complex, highly interacting sets of proteins perform very similar
functions, they have few or no sequences in common and show no evidence
of deriving from a common ancestor. Moreover, although replication
proteins from eukaryotes share some features with their counterparts
among archebacterial replication proteins, the two sets also differ
markedly, including in terms of DNA-origin recognition complexes and
single-stranded DNA-binding proteins.
Striking Similarities between Certain Phage and
However, there are some striking similarities between
certain phage and eukaryotic genes. For example, the DNA polymerase gene
of T4 phage is very similar to that of eukaryotes. The T4 DNA polymerase
is a member of the Pol Beta family, sensitive to the same inhibitors as
eukaryotic DNA Pol delta (extension) and alpha (primase) polymerase,
leading Margarita Salsa and Luis Blanco of the Universidad Autonoma,
Madrid, Spain, to suggest a common origin for them. T4 is a tailed
phage, much like those also found in archaebacteria (e.g., the H phage)
Although T4 is strictly lytic in E. coli,
distantly related viruses can infect some algae species. Some of these
viruses, such as the Feldmania species virus, are persistent and
inapparent, whereas others, such as Chlorella species virus,
CSV-1, of microalgae, are lytic. Chlorella species are
unicellular, parasitic, haploid, and asexual. CSV-1 is a double-stranded
DNA virus (380 kbp), whose genes encode a DNA delta-like polymerase, two
proliferating cell nuclear antigen (PCNA)-like genes, thymidine kinase,
10 tRNA molecules, tRNA synthase, superoxide dismutase, several versions
of ribonucleotide reductase, 12 restriction and modification enzymes
(rare in eukaryotes), hyaluronic acid synthase, cellulose synthase, and
a bacterium-like transposase; the genome also contains about 80 introns.
The evolutionary junction between microalgae and the filamentous brown
algae also coincides with distinctions that developed after the
Precambrian radiation, when multicellular metazoans and sexual
Could a persisting DNA virus of algae have been the
origin of the eukaryotic replication proteins and thus connect the
universal tree of life via a viral linkage? To examine this possibility,
we used the TBLASTN program to compare the DNA polymerase (Pol) sequence
of Feldmania virus, which is specific to Feldmania species
of filamentous brown algae, with a series of other Pols. Included were
the DNA Pol genes from four families of DNA viruses (phycodnaviruses,
herpesviruses [with alpha, beta, and gamma subclades], the poxviruses,
and the baculoviruses), several bacterial phage, and two distinct sets
of archebacterial DNA Pol IIs.
According to this analysis, essentially every member of
the DNA polymerase B family is similar, with conserved functional
domains among all these proteins. Furthermore, when analyzed by
neighbor-joining methods with strong bootstrap statistical support, the
unrooted dendogram identifies sets, or clades, of related polymerases.
The clades correspond to coherent biological sets and include the delta
extension Pol and the primase interacting Pol alpha of host cells.
Most of these clades are distinct from one another, and
many of them link to the unresolved center of the treewith one
notable exception. The DNA Pol from Feldmania virus is located at
the base of the clade that corresponds to that of the host cellular DNA
Pol delta. Since such trees do not establish polarity, this result
suggests two very different possibilities: either (1) the DNA Pol of Feldmania
virus represents the progenitor of all cellular DNA Pol deltas, or (2)
the Feldmania virus acquired its version of DNA Pol from a host
that resembles that progenitor.
A viral progenitor seems more likely. All other viral
DNA Pols within this analysis are not part of their corresponding host
clades even though these viruses appear as old on the dendogram as Feldmania
and are also related to phage. Also, if Feldmania virus obtained
its Pol from its host, it would be unusual compared to other DNA
viruses. Since viruses are transmissible, it seems simpler for the
transfer having been from virus to host. Moreover, proposing that a
virus was the original source of eukaryotic replication proteins could
help to explain the discontinuity between bacterial and eukaryotic
replication proteins, while it would also explain how eukaryotes are
linked to the bacterial world. Patric Forterre of the Universite de
Paris-Sud in Orsay, France, has made similar proposals.
We also analyzed other phycoadnavirus genes,
specifically superoxide dismutase (SOD) and PCNA. Although the results
are less compelling because the data sets are smaller and there is less
robust bootstrap support, they are consistent with a viral origin for
Mammals May Use Endogenous Retroviruses To Suppress
In Utero Immunity
Biologists face a major challenge in explaining how
mammals with adaptive immune systems tolerate carrying a sexually
produced allogeneic embryo and endure other complicated processes that
are part of live births. This complex phenomenon is the defining
phenotype for mammalian organisms, one that presumably arose abruptly.
All mammalian genomes have specific and distinct sets of
endogenous retroviruses (ERV) and much greater numbers of defective
retroviral derivatives, suggesting that mammalian genomes were colonized
by specific lineages of ERV soon after placental species radiated from
one another. Examples include the human LINE-1, SINE R, and more distant
and numerous Alu, related to HERV-K, and the mouse-IAP, hamster-IAP
(distinct from mouse), feline-Rd114, and rhesus-Mason-Pfizer virus.
Retroposon and ERV nomenclature is confusing. The human genome project
indicates that there are several thousand human ERVs, and they appear to
comprise 24 families. Humans have both ancient, such as ERV-L, and newly
acquired versions of ERVs, such as eight ERV K members, which
distinguish humans from close primate relatives.
Meanwhile, avian species lack such an intense level of
retroposon colonization, according to David Mindell of the University of
Michigan, Ann Arbor. Furthermore, mammals are phylogenetically congruent
with their ERVs, whereas birds are not. Mammals and birds also differ in
that although their early embryos are both susceptible to genomic
infection with retroviruses, mammals repress these viruses (via global
DNA methylation) in embryonic tissue. Although the more defective
derivatives of these ERVs (especially whose env gene is deleted) do not
usually encode for products, synonymous codon analysis indicates that
the much smaller number of intact ERVs are maintained along with coding
Mammalian species develop a trophectoderm-derived
placenta, which enables the fertilized egg to invade and implant the
uterine wall, establishes blood contact across the uterine wall,
controls hormonal levels during gestation, and also protects the embryo
from the mother's innate and adaptive immune response. The trophectoderm
is the first cell to differentiate from the fertilized egg at 3.5 days
when the totipotent inner cell mass (embryonic stem cells) is
In the 1970s, researchers observed that normal human and
other mammalian trophectoderm or placental cells produce endogenous
retroviral particles in large numbers. Later, others discovered that
most mammals express their corresponding ERVs in placental and embryonic
tissues. Furthermore, the human ERV sequences being expressed or
transcribed in the early embryo and placenta are diverse, with some of
them intact and others defective. Several ERV protein products are
produced, including various ones containing env sequences.
Placental and early embryo tissues are by far the most common and
abundant sites of ERV expression, followed by lymphatic and malignant
Erik Larsson and colleagues at Uppsala University in
Sweden observed ERV-3 env expression in normal human placental
cells and suggest that intact ERV-3 is needed, possibly for immune
suppression. I subsequently generalized this concept, taking into
account the widespread colonization of placental species by intact ERVs
and the general immunosuppressive nature of the ERV env membrane-spanning
region. Although ERV-3 env is mutated in 1% of Caucasians, the
complexity and number of other human ERV family members suggest that
this and other ERV env proteins are being expressed and could be
involved in immune suppression and other vital developmental processes.
Testing the Idea that ERVs Protect Mammalian Embryos
The idea that ERVs are somehow involved in protecting
mammalian embryos is plausible and attractive. But testing this idea by
suppressing all ERV families in early embryos is daunting, particularly
because there are thousands of ERV loci.
However, findings from polyomavirus studies during the
late 1970s provided us a means for addressing these questions. Those
studies indicate that when embryonic carcinoma (EC) cells express the
large T antigen (T-Ag) of the SV40 polyomavirus, it blocks expression of
endogenous retroviruses without affecting EC cell differentiation.
Moreover, T-Ag is expressed in embryonic stem (ES) cells without
affecting their differentiation. However, transgenic lineages fail to
establish, suggesting to us that T-Ag expression disrupts ERV expression
without affecting cellular differentiation.
To test this idea, Alex Espinosa and I choose to
selectively alter gene expression in mouse embryos by looking at EC
cells because ES cells, although preferable in some respects, are no
longer capable of differentiating into trophectoderm. Thus, we used F9
EC cells to evaluate the affect of T-Ag on expression of IAPE-A, a mouse
ERV that expresses an env protein. Indeed, this env sequence is
highly expressed in a normal mouse blastocyst. However, in EC cells,
T-Ag prevents expression of IAPE-A without affecting EC differentiation
into embryonic bodies, which closely resemble 3.5-day blastocysts.
Although EC-derived embryonic bodies are not fully functional and cannot
produce viable offspring (due to lost totipotency), they undergo
implantation. However, the T-Ag-expressing embryonic bodies fail to
These results are consistent with ERVs playing a role
during embryo implantation, perhaps enabling embryos to avoid
recognition by the mother's immune system. Although these results are
not definitive, the implication is that placental mammals may have
indeed evolved via the colonization by endogenous retroviruses that
bestow complex phenotypes onto the embryos, especially via their
Thus, placental embryos apparently behave like parasites
that invade and infect their hostsnamely, their mothersdrawing
sustenance and producing local viruses to suppress host immune
responses. This relationship resembles a phenomenon observed among some
wasp species that implant their eggs into a larval caterpillar host. The
host's innate defenses are neutralized by endogenous genomic viruses (polyadnaviruses)
made within wasp nurse cells that surround the egg and block the
caterpillar host's antiparasite defense responses. From these infected
host larvae hatch flying wasps, a very distinct morphological life form.
Thus, it is not so farfetched to think that such parasitic mechanisms
might be the very basis for morphogenesis, so common to many flying
As we consider other events during evolution in which
organisms have acquired complex and highly adaptive phenotypes, we
should look for the footprints of persistent genetic parasites. After
all, the RAG1 and RAG2 recombinases, which are essential to the origin
and function of the adaptive immune vertebrate system, closely resemble
a retroviral integrase. The major histocompatibility complex locus
itself is suspiciously colonized with 10 times the usual genomic
frequency of ERVs. These and other footprints may well be the legacy of
viruses that helped to make us.
DeFilippis, V. R., and L. P. Villarreal.
2000. A hypothesis for DNA viruses as the origin of eukaryotic
replication proteins. J.
Espinosa, A., and L. P. Villarreal. 2000.
T-Ag inhibits implantation of EC derived embryoid bodies. Virus Genes 20:195-200.
Forterre, P. 1999.
Displacement of cellular proteins by functional analogues from plasmids
and viruses could explain puzzling phylogenies of many DNA informational
proteins. Mol. Microbiol. 33:457-465.
Villarreal, L. P.
1999. DNA viruses contribution to host evolution, p. 391-419. In E.
Domingo, R. Webster, J. Holland, and T. Picknett (ed.), Origin and
evolution of viruses. Academic Press, London.
Villarreal, L. P.
1997. On viruses, sex and motherhood. J. Virol. 71:859-865.
Villarreal, L. P., V. R. Defilippis, and
K. A. Gottlieb. 2000. Acute and persistent
viral life strategies and their relationship to emerging disease.