PAML primate HKY85

This example just uses PAML to estimate parameters of an HKY85 process and compute a log likelihood, for a small alignment and phylogenetic tree.

Newick tree

1
2
9    1
(TamarinEDN, ((MacaqueECP, (OrangutanECP, (ChimpanzeeECP, GorillaECP))), (MacaqueEDN, (OrangutanEDN, (ChimpanzeeEDN, GorillaEDN)))));

Fasta file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>OrangutanECP
ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTAGTGGTGTGGGGGGCTCACTCCATGCCAAACCCCGACAGTTTACGAGGGCTCAGTGGTTTGCCATCCAGCACGTCAGTCTGAACCCTCCTCAATGCACCACTGCAATGCGGGTAATTAACAATTATCAACGGCGTTGCAAAGACCAAAATACTTTTCTTCGTACAACTTTTGCTAATGTAGTTAATGTTTGTGGTAACCCAAATATAACCTGTCCTCGTAACAGAACTCTCCACAATTGTCATCGGAGTAGATTCCAGGTGCCTTTACTCCACTGTAACCTCACAGGTCAGAATATTTCAAACTGCAAGTATGCAGACAGAACAGAAAGGAGGTTCTATGTAGTTGCATGTGACAACAGAGATCCACGGGATTCTCCACGGTATCCTGTGGTTCCAGTTCACCTGGATACCACCATCTAA
>OrangutanEDN
ATGGTTCCAAAACTGTTCACTTCTCAAATTTCCCTGCTTCTTCTGTTGGGGCTTCTGGCTGTGGACGGCTCACTCCATGTCAAACCTCCACAGTTTACCTGGGCTCAATGGTTTGAAACCCAGCACATCAATATGACCTCCCAGCAATGCAACAATGCAATGCAGGTCATTAACAATTTTCAACGGCGTTGCAAAAACCAAAATACTTTTCTGCGTACAACTTTTGCTAATGTAGTTAATGTTTGTGGTAACCCAAATATAACCTGTCCTAGTAACAGAAGTCGCAACAATTGTCATCATAGTGGAGTCCAGGTGCCTTTAATCCACTGTAACCTCACAAGTCAGAATATTTCAAACTGCAGGTATGCGCAGACACCAGCAAACATGTTCTATATAGTTGCATGTGACAACAGGGATCCACGGGACCCTCCACAGTATCCGGTGGTTCCAGTTCACCTGGATAGAATCATCTAA
>MacaqueECP
ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGTGGAGGGCTCACTCCATGCCAGACCCCCACAGTTTACAAAGGCTCAGTGGTTTGCCATCCAGCACATCAATGTGAACCCCCCTCGATGCACCATTGCAATGCGGGTAATAAATAATTATCAACGGCGTTGCAAAAACCAAAATACTTTTCTTCGTACAACTTTTGCATATACAGCTAATGTTTGTCGTAACGAACGTATACGCTGCCCTCGTAACAGAACTCTCCACAATTGTCATCGTAGTAGATACCGGGTGCCTTTACTCCACTGTGACCTCACAGGTCAGAATATTTCAACCTGCAGGTATGCAGACAGACCAGGACGGAGGTTCTATGTAGTTGCATGTGAAAGCAGAGATCCACGGGATTCTCCACGGTATCCAGTGGTTCCAGTTCACCTGGATACCACCATCTAA
>ChimpanzeeEDN
ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGTGGAGGGCTCACTCCATGTCAAACCTCCACAGTTTACCTGGGCTCAATGGTTTGAAACCCAGCACATCAATATGACATCCCAGCAATGCACCAATGCAATGCAGGTCATTAACAATTATCAACGGCGATGCAAAAACCAAAATACTTTCCTTCTTACAACTTTTGCTAACGTAGTTAATGTTTGTGGTAACCCAAATATGACCTGTCCTAGTAACAAAACTCGCAAAAATTGTCATCAAAGTGGAAGCCAGGTGCCTTTAATCCACTGTAACCTCACAAGTCAGAATATTTCAAACTGCAGGTATGCGCAGACACCAGCAAACATGTTCTATATAGTTGCATGTGACAACAGAGATCAACGGGACCCTCCACAGTATCCGGTGGTTCCAGTTCACCTGGATAGAATCATCTAA
>MacaqueEDN
ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGTGGAAGGCTCACTTCATGCCAAACCCGGACAATTTACCTGGGCTCAGTGGTTTGAAATCCAGCATATAAATATGACCTCTGGCCAATGCACCAATGCAATGCAGGTCATTAACAATTATCAACGGCGATGCAAAAATCAAAATACTTTTCTTCTTACAACTTTTGCTGATGTAGTTCATGTCTGTGGTAACCCAAGCATGCCCTGCCCTAGCAACACAAGTCTCAACAATTGTCATCATAGTGGAGTCCAGGTGCCTTTAATCCACTGTAACCTCACAAGTCGAAGGATTTCAAATTGCAGGTATACACAGACAACAGCAAACAAGTACTACATAGTTGCATGTAACAACAGCGATCCACGGGACCCTCCACAGTATCCAGTGGTTCCAGTTCACCTGGATAGAATCATCTAA
>GorillaEDN
ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCAGTGGAGGGCTCACTCCATGTCAAACCTCCACAGTTTACCTGGGCTCAATGGTTTGAAACCCAGCACATCAATATGACCTCCCAGCAATGCACCAATGCAATGCGGGTCATTAACAATTATCAACGGCGATGCAAAAACCAAAATACTTTCCTTCTTACAACTTTTGCTAACGTAGTTAATGTTTGTGGTAACCCAAATATGACCTGTCCTAGTAACAAAACTCGCAAAAATTGTCATCACAGTGGAAGCCAGGTGCCTTTAATCCACTGTAACCTCACAAGTCAGAATATTTCAAACTGCAGGTATGCGCAGACACCAGCAAACATGTTCTATATAGTTGCATGTGACAACAGAGATCAACGGGACCCTCCACAGTATCCAGTGGTTCCAGTTCACCTGGATAGAATCATCTAA
>GorillaECP
ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGTGGAGGGCTCACTCCATGCCAGACCCCCACAGTTTACGAGGGCTCAGTGGTTTGCCATCCAGCACATCAGTCTGAACCCCCCTCGATGCACCATTGCAATGCGGGTAATTAACAATTATCGATGGCGTTGCAAAAACCAAAATACTTTTCTTCGTACAACTTTTGCTAATGTAGTTAATGTTTGTGGTAACCAAAGTATACGCTGCCTTCATAACAGAACTCTCAACAATTGTCATCGGAGTAGATTCCGGGTGCCTTTACTCCACTGTGACCTCACAGGTCAGAATATTTCAAACTGCAGGTATGCAGACAGACCAGGAAGGAGGTTCTATGTAGTTGCATGTGACAACAGAGATCCACAGGATTCTCCACGGTATCCTGTGGTTCCTGTTCACCTGGATACCACCATCTAA
>TamarinEDN
ATGGTTCCAAAACTGTTCACTTCCCAAATTTGCGTGCTTCTTCTTTTCGGGCTTTTGAGTGTGGAGGTCTCACTCCAGGTCAAACCCCAGCAGTTTTCCTGGGCTCAGTGGTTTAGCATCCAGCACATCCAAACAACTCCCCTCCACTGCACCTCTGCAATGCGGGCAATTAACAGGTATCAACCTCGATGCAAAAACCAAAATACTTTTCTTCATACAACTTTTGCTAATGTAGTTAATGTTTGTGGTAACACAAATATCACCTGCCCTCGTAATGCATCTCTCAACAATTGTCATCACAGTGGAGTCCAGGTGCCTTTAACCTACTGTAACCTCACAGGTCAGACTATTTCAAACTGTGTGTATTCCTCGACTCAGGCAAACATGTTCTATGTAGTTGCATGTGACAACAGAGATCCACGGGATCCTCCACAGTATCCAGTGGTCCCGGTTCACCTGGATACCACCATCTAA
>ChimpanzeeECP
ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGTGGAGGGCTCACTCCATGCCAGACCCCCACAGTTTACGAGGGCTCAGTGGTTTGCCATCCAGCACATCAGTCTGAACCCCCCTCGATGCACCATTGCAATGCGGGTAATTAACAATTATCGATGGCGTTGCAAAAACCAAAATACTTTTCTTCGTACAACTTTTGCTAATGTAGTTAATGTTTGTGGTAACCAAAGTATACGCTGCCCTCATAACAGAACCCTCAACAATTGTCATCAGAGTAGATTCCGGGTGCCTTTACTCCACTGTGACCTCACAGGTCAGAATATTTCAAACTGCGGGTATGCAGACAGACCAGGAAGGAGGTTCTATGTAGTTGCATGTGACAACAGAGATCCACGGGATTCTCCACGGTATCCTGTGGTTCCAGTTCACCTGGATACCACCATCTAA

Baseml control file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
      seqfile = for.paml.ECP_EDN.fasta
     treefile = for.paml.newick.tree

      outfile = mlb           * main result file name
        noisy = 3  * 0,1,2,3,9: how much rubbish on the screen
      verbose = 0  * 0: concise; 1: detailed, 2: too much
      runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic

        model = 4   * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85, 5:T92, 6:TN93, 7:REV
                    * 8:UNREST, 9:REVu; 10:UNRESTu
        Mgene = 1   * 0:rates, 1:separate; 2:diff pi, 3:diff kappa, 4:all diff

        clock = 0   * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
    fix_kappa = 0   * 0: estimate kappa; 1: fix kappa at value below
        kappa = 2.5   * initial or fixed kappa
 
    fix_alpha = 1   * 0: estimate alpha; 1: fix alpha at value below
        alpha = 0.   * initial or fixed alpha, 0:infinity (constant rate)
        ncatG = 5   * # of categories in the dG, AdG, or nparK models of rates
      fix_rho = 1   * 0: estimate rho; 1: fix rho at value below
          rho = 0   * initial or fixed rho, 0:no correlation
        nparK = 0   * rate-class models. 1:rK, 2:rK&fK, 3:rK&MK(1/K), 4:rK&MK 
        nhomo = 1   * 0 & 1: homogeneous, 2: kappa for branches, 3:N1, 4:N2, 5:user
        getSE = 0   * 0: don't want SEs of estimates, 1: want SEs
 RateAncestor = 0   * (0,1,2): rates (alpha>0) or ancestral states
       method = 0   * 0: simultaneous; 1: one branch at a time

   Small_Diff = 1e-6
    cleandata = 0  * remove sites with ambiguity data (1:yes, 0:no)?
*   fix_blength = 1  * 0: ignore, -1: random, 1: initial, 2: fixed

Output file (mlb)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
BASEML (in paml version 4.6, August 2012)  for.paml.ECP_EDN.fasta  HKY85 
Frequencies..
                                    T      C      A      G
OrangutanECP                   0.27848 0.25105 0.26582 0.20464
OrangutanEDN                   0.27637 0.25105 0.28692 0.18565
MacaqueECP                     0.27004 0.25949 0.26371 0.20675
ChimpanzeeEDN                  0.26793 0.24895 0.30169 0.18143
MacaqueEDN                     0.27004 0.24684 0.29536 0.18776
GorillaEDN                     0.26582 0.25316 0.29958 0.18143
GorillaECP                     0.28270 0.24895 0.25738 0.21097
TamarinEDN                     0.28692 0.27426 0.25949 0.17932
ChimpanzeeECP                  0.27637 0.25316 0.25738 0.21308

Homogeneity statistic: X2 = 0.02231 G = 0.02218 

Average                        0.27496 0.25410 0.27637 0.19456

# constant sites:    318 (67.09%)
ln Lmax (unconstrained) = -1446.712071

Distances:HKY85 (kappa)  (alpha set at 0.00)
This matrix is not used in later m.l. analysis.

OrangutanECP     
OrangutanEDN       0.1471( 1.4771)
MacaqueECP         0.0965( 2.3785)  0.1857( 1.7205)
ChimpanzeeEDN      0.1501( 1.6664)  0.0459( 1.2841)  0.1889( 1.8947)
MacaqueEDN         0.1784( 2.3576)  0.1171( 2.6393)  0.2106( 2.1212)  0.1174( 2.8899)
GorillaEDN         0.1467( 1.4737)  0.0509( 1.9570)  0.1821( 1.8540)  0.0111( 1.5375)  0.1173( 3.1303)
GorillaECP         0.0632( 4.5261)  0.1617( 1.9359)  0.0631( 1.8565)  0.1648( 2.1473)  0.1926( 2.5604)  0.1611( 1.9297)
TamarinEDN         0.1955( 1.7006)  0.1872( 1.8382)  0.2381( 1.9132)  0.1805( 1.8140)  0.2177( 2.3613)  0.1735( 1.7645)  0.2117( 1.9055)
ChimpanzeeECP      0.0636( 5.4030)  0.1557( 1.9072)  0.0634( 2.1638)  0.1588( 2.1223)  0.1863( 2.5434)  0.1552( 1.9020)  0.0133(10.5181)  0.2012( 1.8041)

TREE #  1:  (8, ((3, (1, (9, 7))), (5, (2, (4, 6)))));  MP score: 214.00
lnL(ntime: 16  np: 20):  -1727.569162    +0.000000
  10..8    10..11   11..12   12..3    12..13   13..1    13..14   14..9    14..7    11..15   15..5    15..16   16..2    16..17   17..4    17..6  
 0.075656 0.041862 0.069114 0.042439 0.006624 0.043808 0.008219 0.006325 0.006377 0.029546 0.072769 0.024503 0.025128 0.018754 0.004233 0.006750 2.121452 0.252198 0.254579 0.287265

tree length =   0.48211

(TamarinEDN, ((MacaqueECP, (OrangutanECP, (ChimpanzeeECP, GorillaECP))), (MacaqueEDN, (OrangutanEDN, (ChimpanzeeEDN, GorillaEDN)))));

(TamarinEDN: 0.075656, ((MacaqueECP: 0.042439, (OrangutanECP: 0.043808, (ChimpanzeeECP: 0.006325, GorillaECP: 0.006377): 0.008219): 0.006624): 0.069114, (MacaqueEDN: 0.072769, (OrangutanEDN: 0.025128, (ChimpanzeeEDN: 0.004233, GorillaEDN: 0.006750): 0.018754): 0.024503): 0.029546): 0.041862);

Detailed output identifying parameters
kappa under HKY85:  2.12145
Base frequencies:
  0.25220  0.25458  0.28726  0.20596