section 4.4.2.1 marginal reconstruction¶

This is an implementation of the marginal reconstruction in section 4.4.2.1 of Molecular Evolution: A Statistical Approach, pages 127–128.

The stochastic process is called K80 whose state space is {T, C, A, G} which we will encode as {0, 1, 2, 3}. The example uses \(\kappa = 2\), but with a slightly different parameterization of the rate matrix, so all rates in this example are scaled by 0.25 relative to the parameterization on Wikipedia.

{
	"scene" : {
		"node_count" : 9,
		"process_count" : 1,
		"state_space_shape" : [4],
		"tree" : {
			"row_nodes" : [7, 7, 6, 8, 8, 0, 6, 0],
			"column_nodes" : [1, 2, 3, 4, 5, 6, 7, 8],
			"edge_rate_scaling_factors" : [
				0.2, 0.2, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1],
			"edge_processes" : [0, 0, 0, 0, 0, 0, 0, 0]
		},
		"root_prior" : {
			"states" : [[0], [1], [2], [3]],
			"probabilities" : [0.25, 0.25, 0.25, 0.25]
		},
		"process_definitions" : [{
			"row_states" : [
				[0], [0], [0],
				[1], [1], [1],
				[2], [2], [2],
				[3], [3], [3]],
			"column_states" : [
				[1], [2], [3],
				[0], [2], [3],
				[0], [1], [3],
				[0], [1], [2]],
			"transition_rates" : [
				0.50, 0.25, 0.25,
				0.50, 0.25, 0.25,
				0.25, 0.25, 0.50,
				0.25, 0.25, 0.50
				]
		}],
		"observed_data" : {
			"nodes" : [1, 2, 3, 4, 5],
			"variables" : [0, 0, 0, 0, 0],
			"iid_observations" : [[0, 1, 2, 1, 1]]
		}
	},
	"requests" : [
		{"property" : "SNNLOGL"},
		{"property" : "SNDNODE"}
	]
}

{
	"status": "feasible",
	"responses": [
		-7.5814075725577,
		[
			[
05510083401271332,
0,
0,
0,
0,
0,
09346268217301874,
15305678030512224,
010273623631543357],
			[
9013686377853353,
0,
0000000000000002,
0,
9999999999999998,
9999999999999998,
8289772905233035,
8167675701783329,
9847938471666741],
			[
03684321097168115,
0,
0,
0000000000000002,
0,
0,
07037231027182819,
025695339385952767,
003920975661947274],
			[
006687317230270187,
0,
0,
0,
0,
0,
007187717031849643,
0044803101305921095,
0010115535398349186]
		]
	]
}

The output is probably transposed vs. the order you expect; it gives an array of four arrays, each array representing a state, whereas you might expect it to give nine arrays (one for each node) with each of these nine arrays having length four. The reason it is transposed this way is for indexing consistency, which may have been a bad design choice.

In any case, you can see that the ancestral marginal reconstructions in the json output agree with those given in the text:

Node 0 (root):
    T : 0.055
    C : 0.901
    A : 0.037
    G : 0.007

Node 6:
    T : 0.093
    C : 0.829
    A : 0.070
    G : 0.007

Node 7:
    T : 0.153
    C : 0.817
    A : 0.026
    G : 0.004

Node 8:
    T : 0.010
    C : 0.985
    A : 0.004
    G : 0.001