Radio Galaxy Zoo: Machine learning for radio source host galaxy cross-identification

M. J. Alger, J. K. Banfield, C. S. Ong, L. Rudnick, O. I. Wong, C. Wolf, H. Andernach, R. P. Norris, S. S. Shabala

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


We consider the problem of determining the host galaxies of radio sources by crossidentification. This has traditionally been done manually, which will be intractable for widearea radio surveys like the Evolutionary Map of the Universe. Automated cross-identification will be critical for these future surveys, and machine learning may provide the tools to develop such methods. We apply a standard approach from computer vision to cross-identification, introducing one possible way of automating this problem, and explore the pros and cons of this approach. We apply our method to the 1.4 GHz Australian Telescope Large Area Survey (ATLAS) observations of the Chandra Deep Field South (CDFS) and the ESO Large Area ISO Survey South 1 fields by cross-identifying them with the Spitzer Wide-area Infrared Extragalactic survey. We train our method with two sets of data: expert cross-identifications of CDFS from the initial ATLAS data release and crowdsourced cross-identifications of CDFS from Radio Galaxy Zoo. We found that a simple strategy of cross-identifying a radio component with the nearest galaxy performs comparably to our more complex methods, though our estimated best-case performance is near 100 per cent. ATLAS contains 87 complex radio sources that have been cross-identified by experts, so there are not enough complex examples to learn how to cross-identify them accurately. Much larger data sets are therefore required for training methods like ours. We also show that training our method on Radio Galaxy Zoo cross-identifications gives comparable results to training on expert cross-identifications, demonstrating the value of crowdsourced training data.

Original languageEnglish (US)
Pages (from-to)5556-5572
Number of pages17
JournalMonthly Notices of the Royal Astronomical Society
Issue number4
StatePublished - Aug 21 2018

Bibliographical note

Funding Information:
This publication has been made possible by the participation of more than 11 000 volunteers in the Radio Galaxy Zoo project. Their contributions are individually acknowledged at http://rgzauthors.galax Parts of this research were conducted by the Australian Research Council Centre of Excellence for All-sky Astrophysics, through project number CE110001020. Partial support for LR was provided by U.S. National Science Foundation grants AST1211595 and 1714205 to the University of Minnesota. HA benefitted from grant 980/2016-2017 of Universidad de Guanajuato. We thank A. Tran and the reviewer for their comments on this manuscript. Radio Galaxy Zoo makes use of data products from the Wide-field Infrared Survey Explorer and the Very Large Array. The Wide-field Infrared Survey Explorer is a joint project of the University of California, Los Angeles, and the Jet Propulsion Laboratory/California Institute of Technology, funded by the National Aeronautics and Space Administration. The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. The figures in this work made use of ASTROPY, a community-developed core PYTHON package for Astronomy (Astropy Collaboration et al. 2013). The Australia Telescope Compact Array is part of the Australia Telescope, which is funded by the Commonwealth of Australia for operation as a National Facility managed by the CSIRO.


  • Galaxies: active
  • Infrared: galaxies
  • Methods: statistical
  • Radio continuum: galaxies
  • Techniques: miscellaneous

Fingerprint Dive into the research topics of 'Radio Galaxy Zoo: Machine learning for radio source host galaxy cross-identification'. Together they form a unique fingerprint.

Cite this