How does FEL/IFEL infer selection?
Phase 1: Nucleotide model maximum likelihood (ML) fit
A nucleotide model (any model from the time-reversible class can be chosen) is fitted to the data and tree
(either NJ or user supplied) using maximum likelihood to obtain branch lengths and substitution rates. If the input alignment contains multiple segments,
base frequencies and substitution rates are inferred jointly from the entire alignment, while branch lengths are fitted to each segment separately.
The "best-fitting" model can be determined automatically by a model selection procedure or chosen by the user.
Phase 2: Codon model ML fit
Holding branch lengths proportional to and subsitution rate parameters constant at the values estimated in Phase 1, a codon model
obtained by crossing MG94 and the nucleotide model of Phase 1 is fitted to the data to obtain codon branch lengths for scaling dN and dS estimated subsequently from each site.
Phase 3: Site by site likelihood ratio test
For every site, utilizing parameter estimates from Phases 1 and 2, an MG94 based codon model from Phase 2, now with two parameters - α
(instantaneous synonymous site rate) and β (instantaneous non-synonymous site rate) rate
are first fitted independently, and then under the constraint α=β. Next, a one degree of freedom likelihood ratio test is
performed to infer whether α is different from β, and a p-value is derived. If the p-value is significant, the site is classified based on
whether α>β (negative selection) or α<β (positive selection).
IFEL is essentially the same as FEL, except that selection is only tested for along internal branches of the phylogeny. Each site has three parameters,
α (syn. rate), β_I (non.syn. rate for internal branches) and β_L (non.syn. rate for terminal branches). The null model now assumes
that α=β_I (β_L is unconstrained in both models). This test is appropriate when 'population level' effects are sought.
UCSD Viral Evolution Group 2004-2021