# PCAT

An elementary operation in astrophysics is that of making a catalog of point sources in a given image, i.e., a two dimensional array of photon counts. The traditional approach is to address the following question: what is the set of point source positions and fluxes that maximizes the Poisson likelihood that the observed photon count map is generated by the point source model.

However there is something fundamentally wrong about this method (pardon my Bayesian priors here). In a deterministic catalog, point sources in a crowded field can be blended together when their separation is comparable to the PSF. Similarly, background fluctuations or mismodeling can fake multiple point sources when there is actually a single real source. Given an observed photon count map, the maximum likelihood solution will either identify a single source or multiple sources. Therefore depending on what the reality is it will either miss an existing point source or overfit by introducing a spurious one. This type of across-model covariance cannot easily be accounted for in a frequentist approach.

Secondly a deterministic catalog will, by definition, only includes point sources that survive a hard significance threshold, i.e., $$TS=-2\Delta\log\mathcal{L}=25$$. These point sources are then taken as the truth, while the point source candidates below the detection threshold are discarded. The output product is then a flux limited catalog, where the flux distribution rolls off towards the dim end. As a result the construction of a deterministic catalog causes considerable loss of information, which could otherwise be fed into further analyses that rely on the catalog.

These concerns lead me, a fellow graduate student at the Harvard-CfA, Stephen Portillo, and my advisor Douglas Finkbeiner, to construct a Bayesian framework to take samples from the space of catalogs consistent with the image. The ensemble of such fair samples together is a representation of our state of knowledge about the point sources in the image. This is in stark contrast with the frequentist approach, which attempts to represent it with an estimator for the most likely solution.

## Trans-dimensional sampling

By construction, the catalog space is a trans-dimensional space, which encompasses the parameter space of all point source models of any dimension. Therefore constructing a Markov chain, whose stationary distribution is the target probability distribution in the catalog space, requires across-model moves. We perform these state transitions using reversible jumps that respect detailed balance. Therefore multiple types of proposals such as birth, death, split and merge are used along with the usual heavy-tailed within-model transitions in order to span the catalog space.

## Hierarchical modeling

A given point source can be regarded as a realization of an underlying Poisson process unique for the population to which it belongs. The priors on its position, flux or color can then be made conditional on a few hyperpriors, which characterize the spatial, spectral or color distribution of its population. This type of hierarchical modeling allows us to constrain the form of the priors, while allowing the individual point sources to be independent realizations of the population. Note that hyperpriors are just higher level priors that control the shape of the priors on the individual point sources and that the likelihood is still invariant with respect to hyperpriors!

## Making all this possible

Despite its superior properties compared to deterministic catalogs, probabilistic cataloging has not been the mainstream approach to cataloging. This is expected, since probabilistic cataloging is a computationally demanding task. Given that the dimensionality of the hypothesis space is variable and large, taking independent samples from it using MCMC requires long simulations. Therefore we optimize the time performance of the sampler by making the model prediction efficient and introducing approximations when possible.

# Software

The resulting software, PCAT (Probabilistic CATaloger) is designed to be a general-purpose catalog sampler with a user-friendly API. If you have any concerns or questions feel free to file an issue in its GitHub repository, where you can download it and learn about its user interface. The project is currently in its unstable pre-release stage and is actively being developed.

## Probabilistic cataloging at work

Here is a sneak peek into the output of a probabilistic catalog. The image shows fair samples drawn from the catalog space consistent (up to Poisson likelihood) with the gamma ray sky towards the North Galactic Polar Cap, i.e., if you look towards the zenith taking the galactic plane as datum.