Garage of Code: Sampling: Computing stationary distribution

Sampling a series with the box tree defines a Markov chain. A valid set of states is to split the first dimension everywhere where a box ends:

Given a value on the x-axis, the sampling distribution is uniquely defined by which interval (confined by red lines) the value is in.

Realizing this equivalence with respect to the sampling, gives us a finite number of states. We can calculate the transition probabilities from a given state by integrating the conditional distribution over the states' intervals. The conditional distribution is just a piecewise uniform distribution, so integrating over an interval is simple. This gives us the transition matrix P.

If the stationary distribution exists, it can be found by solving:

$Px = x => (P - I)x = 0$

Where we have the additional constraint that the sum of x should be 1. This can be solved by scipy.linalg.nullspace. Using this stationary distribution instead of relying on series sampling gives us a faster and also exact measure of the series entropy.

Greedy search for transition function when using theoretical values for entropy, instead of samples. An improvement from before is that the optimization doesn't get stuck and is unable to find a better neighbour. However, it clearly gets stuck in a local valley.

This leads to a better search for interesting transition functions (low prior entropy, high posterior entropy), in the sense that it is able to keep optimizing for a long time. The shortcoming of the sampling-based measure was that sometimes the finite sample sequence was a "lucky" one that gave a good score, but perhaps was not representative for the ensemble average. Then, it was hard for it to find a better neighbour, since the current state was overestimated. Using the Markov chain model solves this. However, the search clearly gets stuck in a local valley.

Also, the measure itself should have some addition. I would like to premier transition distributions with a lot of almost-stationary states. This would make the series stay in those states for a long time before transitioning to another state. Still, we don't want it to get stuck in those states. The property of not getting stuck in some states is called ergodicity. We should look for almost non-ergodic processes.

Garage of Code

codecogs equations

fredag 1 mars 2019

Sampling: Computing stationary distribution

Inga kommentarer:

Skicka en kommentar