Paper II

What are the odds that Gaia DR2 missed a star?

Douglas Boubert and Andrew Everall

Submitted to MNRAS (awaiting the second round of comments).


The second data release of the Gaia mission contained astrometry and photometry for an incredible 1,692,919,135 sources, but how many sources did Gaia miss and where do they lie on the sky? The answer to this question will be crucial for any astronomer attempting to map the Milky Way with Gaia DR2. We infer the completeness of Gaia DR2 by exploiting the fact that it only contains sources with at least five astrometric detections. The odds that a source achieves those five detections depends on both the number of observations and the probability that an observation of that source results in a detection. We predict the number of times that each source was observed by Gaia and assume that the probability of detection is either a function of magnitude or a distribution as a function of magnitude. We fit both these models to the 1.7 billion stars of Gaia DR2, and thus are able to robustly predict the completeness of Gaia across the sky as a function of magnitude. We extend our selection function to account for crowding in dense regions of the sky, and show that this is vitally important, particularly in the Galactic bulge and the Large and Small Magellanic Clouds. We find that the magnitude limit at which Gaia is still 99% complete varies over the sky from G=18.9 to 21.3. We have created a new Python package selectionfunctions ( which provides easy access to our selection functions.

We know how many times Gaia observed each star, but how many times was each star seen?

Gaia DR2 contains every star that was detected five times.

Flipping Coins (interactive plot!)

Our methodology is rooted in a simple problem that is often used to introduce Bayesian statistics: how do you determine the bias of a weighted coin?

Suppose we flip a coin n times and observe k heads and n-k tails. We can use the number of heads and tails to infer the bias of the coin, i.e. the probability the coin will land heads when flipped. The connection between this problem and the problem of the Gaia selection function is immediate: if a star is detected k = 5 times out of the n times that Gaia observes it, then that star will be in Gaia DR2. Working out the probability that Gaia detects a star k times out of n observations is the same statistical problem as quantifying the probability that a coin comes up heads k times out of n flips.

We considered two models. Our simple model (Model T) assumed that the detection probability is directly a function of the magnitude G, and thus that all stars with the same magnitude have the same detection probability. Our more realistic model (Model AB) assumed that the detection probability for each star was drawn from a Beta distribution with parameters A(G) and B(G) that are each functions of the brightness. On the left we show the Model AB posterior on the detection probability and the resulting completeness map.

One million's a crowd

The probability that a Gaia observation of a point source results in a detection can be much lower in crowded regions of the sky, because Gaia can only simultaneously observe 1,000,000 million sources per square degree. This drop in detection probability decreases the number of sources that have at least five detections, thus causing the completeness of Gaia DR2 to drop in crowded regions.

We modelled the detection probability separately in ten regions of the sky grouped by their source density and show the resulting Model T and AB posteriors on the right. We assumed that crowding doesn't affect stars brighter than G = 16.

Below we show maps of the last magnitude at which Gaia is 99% complete under Model AB, ignoring crowding (left) and accounting for crowding (right).

More observations means more detections ...

This is a map across the sky of the last magnitude at which Gaia is 99% complete. Gaia is more complete in parts of the sky with more observations.

... but remember that Gaia doesn't like crowds.

Accounting for the lower detection probability in crowded regions changes the map completely. Crowding is a non-negotiable aspect of Gaia's completeness.


selectionfunctions is a new Python package that allows you to download and query our selection functions with ease. All of the future Completeness of the Gaia-verse selection functions will be added and we plan to add the selection functions of other surveys as the community requests them.