February 19th, 2015. A new method, BUSTED, a new approach to identifying gene-wide evidence of episodic positive selection where the non-synonymous substitution rate is transiently greater than the synonymous rate, is available for use here.
February 9th, 2015. A new tool, a Newick Tree Viewer, is available to view, manipulate, and annotate Newick formatted trees from within your browser.
December 23rd, 2014. A new method, RELAX, a general hypothesis testing framework for detecting relaxed selection in a codon-based phylogenetic framework, is available for use here.
Welcome to the free public server for comparative analysis of
sequence alignments using state-of-the-art statistical models.
This service is brought to you by the viral evolution
group at the School Of Medicine of the University of
California, San Diego. Over its lifetime Datamonkey.org has processed 500594
analyses at a rate of 231 jobs/day (over the last 30 days).
Use our recommended method, MEME,
to look for evidence of both diversifying, and importantly,
episodic, selection at individual sites.
codon-based maximum likelihood methods, SLAC,
FEL, REL, and
can be used estimate the dN/dS (also known as Ka/Ks or
ω) ratio at every codon in the alignment. An
exhaustive discussion of each approach can be found in the
methodology paper. The codon-based
maximum likelihood IFEL
method can investigate whether sequences sampled from a
population (e.g. viral sequences from different hosts) have
been subject to selective pressure at the population level
(i.e. along internal branches). A discussion of the method
and its application can be found here All six methods can also take
recombination into account. This is
done by screening the sequences for recombination
breakpoints, identifying non-recombinant regions and
allowing each to have its own phylogentic tree.
Protein sequences can be screened for evidence of
directional using the DEPS
method, described here, useful when one wants to detect
convergent evolution or selective sweeps. For coding
sequences, the TOGGLE model, developed by Wayne Delport and colleagues, can
detect selection-driven changes that result in amino-acid
toggling. A canonical example of this can be found in
immune-driven evolution of HIV-1 (escape and reversion).
Use the PRIME
method to look for site-specific aminoacid properties (e.g.
charge, polarity) which are being preserved or modified by
the evolutionary process. For example, when a site is
positively selected, evolution may be working to change
side-chain volume, while maintaining polarity.
Use the RELAX
method for detecting relaxed selection in a codon-based phylogenetic framework.
Given two subsets of branches in a phylogeny, RELAX can determine whether
selective strength was relaxed or intensified in one of these subsets relative
to the other.
Using the modeling framework, which allows the
efficient estimations with models which permit dN/dS
variation along both sites and lineages, Datamonkey
implements a test for finding lineages subject to episodic
diversifying selection (EDS).:Branch-site
REL method, identifies those branches where a
proportion of sites evolves under EDS. If you are primarily
interested in finding which lineages (but don't care
about which sites) have experienced EDS, use this method.
Deprecated in favor
of Branch-site REL. The codon-based genetic
method can automatically partition all branches of the
phylogeny describing non-recombinant data into groups
according to dN/dS. Robust multi-model inference is used to
collate results from all models examined during the run to
provide confidence intervals on dN/dS for each branch and
guard against model misspecification and overfitting
PARRIS method, developed by Konrad Scheffler and colleagues,
extends traditional codon-based likelihood ratio tests to
detect if a proportion of sites in the alignment evolve
with dN/dS>1. The method takes recombination and
synonymous rate variation into account.
method, described in a 2010 paper, fits a versatile
general discrete bivariate model of site-by-site selective
force variation to partition all sites into selective
classes, and obtains an approximate posterior distribution
of this partititoning. The resulting "noisy" distribution
of selective regimes is the evolutionary fingerprint of a
gene. The EVF (evolutionary fingerprinting) module
implements this procedure, and can also infer which
individual sites appear to be positively selected while
accounting for parameter estimation error (analogous to the
BEB methodology of the PAML package).
A Bayesian graphical model is deduced from reconstructed
substitutions at each branch/site combination to infer
conditional evolutionary dependancies of sites in the
alignments, i.e. whether a site is more or less likely to
experience a non-synonymous substitution at a branch when
certain other sites do (or do not) experience
non-synonymous substitutions at the same branch. The
SPIDERMONKEY method was introduced in the
evolutionary context in our paper
on the evolution of the phenotypically important and highly
variable V3 loop of the envelope glycoprotein in HIV-1.
Recombination leaves an imprint on sequence alignments:
different segments of the alignment may be described by
different phylogenetic trees, called phylogenetic
discordance. Datamonkey.org implementes two methods:
suitable for answering the question "Is there evidence of
recombination in the alignment?", and
GARD, that attempts to find all the recombination
breakpoints. Both method are described in this paper.
The output of GARD is accepted by most other analyses, and
because recombination can mislead phylogenetic analysis
that do not account for it, we strongly urge that
recombination testing be done on any alignment that is
going to be analyzed for positive selection.
You can also submit
a collection of HIV-1 sequences for recombination screening
by a specialized recombination detection algorithm
SCUEAL described in this paper.
For each type of data, nucleotide, amino-acid and codon,
Datamonkey implements separate model selection procedures.
An exhaustive search is performed for all possible (Markov,
time-reversible) models of nucleotide evolution. For
protein data, a collection of published empirical models
are fitted to the alignment and the best one is selected
using AICc. Finally, for coding data, a
sophisticated genetic-algorithm procedure described in our
recent paper is used to examine thousands of
potential models and report the best one and various
metrics based on the set of credible models - this feature
is implemented in the CMS
module implements three different approaches to
reconstructing ancestral sequences: joint, marginal and
sampled - see this paper for a description and
original methodology attribution, from simple or