Datamonkey automatically recognizes five aligned sequence data formats and also autodetects whether the data is nucleotide (codon) or aminoacid. The recognized formats are:
- NEXUS
- files in NEXUS format. The following NEXUS blocks are supported:
DATA, CHARACTERS,TAXA, ASSUMPTIONS (for data partitioning) and TREES.
See notes on HyPhy constants DATA_FILE_PARTITION_MATRIX and NEXUS_FILE_TREE_MATRIX
for details on how to access NEXUS trees and partitions once the file
has been read.
- PHYLIP Sequential and Interleaved:
PHYLIP option characters in
the first line are ignored
- FASTA (or #) Sequential: taxa
names preceded by '>' (or '#'), complete sequence data follow the name
of the taxon
- FASTA (or #) Interleaved: list of taxa names preceded by '>' (or '#'_, blocks of
sequence data follow in the same order as the names of the taxa.
Here are some small example file for each format, with a tree string present in the file.
FASTA Sequential.
>a
ATGATTCAACCTCAGACCCTTTTAAATGTAGCAGATAACAGTGGAGCTCGAAAATTGATG
>b
ATGATTCAACCTCAGACCCATTTAAATGTAGCGGATAACAGCGGGGCTCGAGAATTGATG
>og
ATGATTCAACCTCAAACTTATTTAAATGTTGCAGATAATAGTGGAGCTCGAAAACTAATG
((a,b),og);
# Interleaved. (Line Width 30, Gap Width 10)
#a
#b
#og
ATGATTCAAC CTCAGACCCT TTTAAATGTA
ATGATTCAAC CTCAGACCCA TTTAAATGTA
ATGATTCAAC CTCAAACTTA TTTAAATGTT
GCAGATAACA GTGGAGCTCG AAAATTGATG
GCGGATAACA GCGGGGCTCG AGAATTGATG
GCAGATAATA GTGGAGCTCG AAAACTAATG
((a,b),og);
PHYLIP Sequential.
3 60
a ATGATTCAAC CTCAGACCCT TTTAAATGTA GCAGATAACA GTGGAGCTCG
AAAATTGATG
b ATGATTCAAC CTCAGACCCA TTTAAATGTA GCGGATAACA GCGGGGCTCG
AGAATTGATG
og ATGATTCAAC CTCAAACTTA TTTAAATGTT GCAGATAATA GTGGAGCTCG
AAAACTAATG
1
((a,b),og);
PHYLIP Interleaved. (Line Width 30, Gap Width 10)
3 60 I
a ATGATTCAAC CTCAGACCCT TTTAAATGTA
b ATGATTCAAC CTCAGACCCA TTTAAATGTA
og ATGATTCAAC CTCAAACTTA TTTAAATGTT
GCAGATAACA GTGGAGCTCG AAAATTGATG
GCGGATAACA GCGGGGCTCG AGAATTGATG
GCAGATAATA GTGGAGCTCG AAAACTAATG
1
((a,b),og);
NEXUS.
#NEXUS
BEGIN TAXA;
DIMENSIONS NTAX = 3;
TAXLABELS
'a' 'b' 'og' ;
END;
BEGIN CHARACTERS;
DIMENSIONS NCHAR = 60;
FORMAT
DATATYPE = DNA
GAP=-
MISSING=?
;
MATRIX
'a' ATGATTCAACCTCAGACCCTTTTAAATGTAGCAGATAACAGTGGAGCTCGAAAATTGATG
'b' ATGATTCAACCTCAGACCCATTTAAATGTAGCGGATAACAGCGGGGCTCGAGAATTGATG
'og' ATGATTCAACCTCAAACTTATTTAAATGTTGCAGATAATAGTGGAGCTCGAAAACTAATG;
END;
BEGIN TREES;
TREE tree = ((a,b),og);
END;