TDPY

PyPI version

If you spend most of your time playing with data, a homebrew sampler is your friend. This is because any type of inference (given some data) can almost always be performed with the use of a likelihood function, which encodes the probability of obtaining the observed data under the assumption of a model. Potentially folding our priors in, then, the fundamental problem of inference is essentially reduced to that of sampling from a non-analytical and potentially multimodal and/or large dimensional posterior probability density. This lead me to write my own Markov Chain Monte Carlo (MCMC) sampler, tdpy.mcmc. Along with other numerical analysis and visualization routines in tdpy.util, tdpy is the library I have been maintaining for my own research.

The primary product is a general purpose, Metropolis-Hastings MCMC sampler with adaptive burn-in, tdpy.mcmc. Given a likelihood function and a prior distribution in the parameter space of interest, it takes heavy-tailed Gaussian steps to construct a Markov chain of states, whose stationary distribution is the target probability density. The sampler takes steps in a transformed parameter space where the prior density is uniform. By setting the keyword optiprop=True, the sampler takes its initial steps while optimizing its proposal scale until the acceptance ratio is ~ 25%. These samples are later discarded along with the first numbburn samples. The chain can also be thinned down by a factor of factthin.

In MCMC, the essential idea is to explore the target probability distribution by proposing updates to a Markovian (memoryless) chain that respect detailed balance. The most common problem that samplers face is that of scale-variance, i.e., when the scale at which the target (posterior) distribution changes depends on where you are in the parameter space. If this is the case, the scale of the proposal distribution, which needs to be constant in order for the chain to converge, will miserably fail to explore the target distribution. A famous example is that of a banana-shaped target function. Here as an illustration I run the sampler against a likelihood function in the form of the Turkish flag, i.e., the crescent and star. The flag is first filtered by a Gaussian kernel and then scaled exponentially to make it harder to sample from.

PCAT

The resulting chain from a run with a million samples shows that the crescent is well sampled.

PCAT

For the full API, refer to the GitHub project page.