How does SLAC infer selection?

Complete method details can be found in our MBE paper

Phase 1: Nucleotide model maximum likelihood (ML) fit

A nucleotide model (any model from the time-reversible class can be chosen) is fitted to the data and tree (either NJ or user supplied) using maximum likelihood to obtain branch lengths and substitution rates. If the input alignment contains multiple segments, base frequencies and substitution rates are inferred jointly from the entire alignment, while branch lengths are fitted to each segment separately. The "best-fitting" model can be determined automatically by a model selection procedure or chosen by the user.

Phase 2: Codon model ML fit

Holding branch lengths and subsitution rate parameters constant at the values estimated in Phase 1, a codon model obtained by crossing MG94 and the nucleotide model of Phase 1 is fitted to the data to obtain a global ω=dN/dS ratio. Optionally, a confidence interval on dN/dS can be obtained using profile likelihood.

Phase 3: ML ancestral sequence reconstuction

Utilizing parameter estimates from Phases 1 and 2, codon ancestral sequences are reconstructed site by site using maximum likelihood, in such a way as to maximize the likelihood of the data at the site over all possible ancestral character states. Inferred ancestral sequences are treated as known for Phase 4. In HyPhy, it is also possible to weight over all possible ancestral sequences, or to draw a sample from possible ancestral sequences in proportion to their relative likelihood.

Phase 4: Inference of selection at each site

For every variable site, four quantities are computed: normalized expected (ES and EN) and observed numbers (NS and NN) of synonymous and non-synonymous substitutions. SLAC estimates dN = NN/EN and dS = NS/ES, and if dN < (or >) dS a codon is called negatively (or positively) selected. As p-value derived from a two-tailed extended binomial distribution is used to assess significance. The test assumes that under neutrality, a random substitution will be synonymous with probability P = ES/(ES+EN), and computes how likely it is that given that P, NS out of NN+NS substitutuons are synonymous.

UCSD Viral Evolution Group 2004-2024