Background¶

For the full background and theory underlying this project, please see (link to papers). We provide here a basic summary of the function of the code.

The end goal of the project is to determine, given a star (real or hypothetical) with magnitude \(m\) and position on the sky \((\alpha,\delta)\), the probability that such a star would make it into the data pipeline of the Gaia Space Telescope. In order to do so, we must determine as a function of time, \(t\), and position on the sky, \(\vec{\theta}\), the operating efficiency of the telescope.

We do this by maximising the Likelihood that the data within the Stellar Data would be achieved given a proposed efficiency, \(\vec{x}\), and the appropriate priors:

\[\mathcal{L}(\vec{x} | ~\text{data}) \propto p(\text{data} | \vec{x}) \times \text{prior}(\vec{x})\]

The maximal likelihood is found using a modified ADAM optimizer, which makes use of standard Stochastic Gradient Descent methods.

Probability Model¶

A proposed efficiency \(\vec{x}\) contains a list of efficiency parameters, \(\{x_{t_i}\}\), for each of the time bins \(t_i\), as well as a set of spatial efficiencies, \(\{x_{m\ell}\}\), for each magnitude bin, \(m\), and HEALPix location \(\ell\).

Each star is determined to have been ‘visited’ by Gaia at a set of time \(\tau_j\), which are determined using the nominal Scanning Law and the location of the star on the sky. The probabilty that this visitation leads to a detection within the Gaia pipeline is therefore given as:

\[p(\text{detect} | \text{visited}, m, \tau_j) = \frac{1}{1 + \exp{(x_{\tau_j})}} \times \exp\left( - 0.5\log(2) \left( e^{x_{m\ell_1(\tau_j)}} + e^{x_{m\ell_2(\tau_j)}} \right) \right)\]

The terms \(\ell_1\) and \(\ell_2\) are both included because Gaia has two fields of view separated by \(106.5^\circ\): \(\ell_1(\tau_j)\) is the HEALPix location being viewed by the first Field of View at the time of the visitation, whilst \(\ell_2(\tau_j)\) is the viewing location of the second FoV.

Given the set \(\{\tau_j\}\) of visitations for a star, we may therefore generate a set \(\{p_j\}\) of associated probabilities. The observational data associated with each star tells us how many times the star was entered into the astrometric pipelines: \(k\) successful detections, in contrast to our \(n\) visitations.

In general, if there are \(n\) events each with a probability \(p_i\) of success, the total number of successes, \(K\) follows a Poisson Binomial Distribution:

\[p(K = k | \{p_i\}, n) = \sum_{A \in F_k^n} \prod_{i \in A} p_i \prod_{j \in A^c} (1 - p_j)\]

Where \(F_k^n\) is the set of all subsets of \(k\) integers that can be picked from \(\{1,...,n\}\) and \(A^c\) denotes the complement of set \(A\).

However, whilst analytically correct, this bears two problems for us: firstly it is expensive to compute, and secondly it is extremely unforgiving: if \(n = k\), then the allowed probabilities for \(p_i\) are extremely constrained - and if \(k > n\), the distribution becomes meaningless. This is important for us, as the nominal scanning law is imperfect, and our inferred visitation times may very well diverge strongly from the actual visitations.

We therefore elect to use the more lenient normal approximation to the Poisson Binomial, which allows for some additional variance.

Variance Model¶

Property Spaces¶

Within this project, we encode our Efficiency Vector on two different spaces, termed the Raw and Transformed spaces:

Raw space is the version handled by the optimizer, and the space on which the actual optimization occurs. This space is often referred to as \(z\)-space, and the variables associated with it defined accordingly.
Transform space is the more physically/mathematically meaningful space, and the space which the Likelihood function operates within. This space is often referred to as \(x\)-space, or \(p\)-space.

The spaces are linked by Forward and Backward transforms. The splitting of the spaces grants us a number of advantages:

We avoid a complex, interconnected prior by having a simple prior in Raw space
We can enforce bounds on our parameters with appropriate transforms (i.e. \(x_i > 0\) can be enforced by \(x_i = e^{x_i}\))

Forward Transform¶

The Forward Transform converts the Raw vector into the Transformed vector, such that \(\vec{x} = \text{ForwardTransform}(\vec{z})\).

The Forward Transform has 3 components: Temporal, Spatial and Hyper.

Temporal Forward Transform¶

With \(Nt\) components of both the temporal part of \(\vec{x}\) and \(\vec{z}\) (denoted \(\vec{x}^t\) and \(\vec{z}^t\) respectively), the transform is given by:

\[\begin{split}\begin{align} q_{Nt-1} & = z^t_{Nt-1} \\ q_i & = \sqrt{1 - e^{- 2/\ell_t} } \times z^t_i + e^{-1/\ell_t} q_{i+1} \\ x^t_i & = \mu_t + \sigma_t \times q_i \end{align}\end{split}\]

As the prior on \(\vec{z}^t\) is simply the zero-mean, unit-normal Gaussian, \(\mu_t\) and \(\sigma_t\) are the corresponding mean and standard deviations of the prior on \(x_t\). The quantity \(\ell_t\) is the coupling lengthscale, which enforces correlation between the temporal components.

Spatial Forward Transform¶

We use spherical needlets to decompose the HEALPix-mapped sky into correlated units: our Raw spatial vector, \(\vec{z}_{ms}\), contains a needlet-weighting for the \(m\)-band sky map, whilst the corresponding \(\vec{x}_{ml}\) contains the efficiency parameter for the \(l\)-th HEALPix location of the \(m\)-band sky map.

They are related to each other by:

\[x_{ml, p} = \mu_p + \sigma \sum_{j = 0}^{\texttt{needlet_order}} \sum_{k = 0}^{N_j}\]

Need to go over this is some more detail

Hyper Forward Transform¶

The hyperparameters associated with the coefficients of the Variance Model are unconstrained and hence unaltered by the transform:

\[x_{\text{coef}~i}^h = z_\text{coef}^h\]

The hyperparameters associated with the population weightings, however, are constrained by the fact that they must be \(x_{\text{frac}~i}^h > 0\) and \(\sum_i x_{\text{frac}~i}^h = 1\). The transform maps the unconstraintd \(\vec{z}\) such that:

\[x_{\text{frac}~i}^h = \frac{\exp(z_{\text{frac}~i}^h)}{\sum_i \exp(z_{\text{frac}~i}^h)}\]

This necessarily removes a degree of freedom, so there is an inherent degeneracy in this transform.

Backward Transform¶

The Backward Transform is not quite the inverse of the Forward Transform – instead of recovering \(z\) from \(x\), we recover the associated gradients, such that \(\nabla_\vec{z} \mathcal{L} = \text{BackwardTransform}(\nabla_\vec{x} \mathcal{L})\).