Machine learning redshift

9/1/2023

Set 3 uses a training sample made up entirely of sources from the eCDFS field, with the test set made up from the ELAIS-S1 field. Set 2 uses a training set made up entirely of sources from the European Large Area ISO Survey-South 1 (ELAIS-S1) field, with the test set made up from the Extended Chandra Deep Field South (eCDFS) field. Set 1 uses a training set randomly selected from the dataset. To test the generalisation of the algorithms across the sky, I use three different training and test sets. Dataset C is a subset of Dataset B, with the sources removed that have optical or IR photometry below the detection limits of all-sky surveys. Dataset B is a subset of Dataset A, with those sources without complete multi-wavelength photometry removed. Dataset A consists of all sources with a spectroscopic redshift, with the sources with missing observations included, and those missing values filled with the mean of that feature across the entire dataset.

Based on the combined multi-wavelength catalogue, I create three datasets. The 1.4 GHz flux –measured by ATLAS – was combined with Infrared (IR) fluxes from the Spitzer Wide-area Infrared Extragalactic Survey (SWIRE), optical magnitudes from the DES, and spectroscopic redshi. I use a radio-selected dataset, built from the Australia Telescope Large Area Survey (ATLAS) 1.4 GHz radio survey which was completed in anticipation of the EMU project, and has been observed to around the depth of the EMU project. The kNN tests include using five different distance metrics. In this thesis, I examine the utility of the k-NearestNeighbours (kNN) and Random Forest (RF) regression and classification algorithms for estimating the redshift of a source from its features. While Machine Learning (ML) techniques have proved to be effective, most have not been conclusively tested on radio-selected datasets, at the higher redshift ranges expected from the EMU project. For the most part, the datasets used are not radio-selected – which typically fail using photometric template fitting methods – and are limited in redshift. Previous research has used machine learning (ML) to estimate redshift, but has primarily focused on trying to match the best results provided by photometric template fitting, using the best, and most complete data available. The majority of the newly discovered radio sources will have limited multi-wavelength photometry, whereas traditional photometric template fitting methods requires high-quality, complete multiwavelength photometry. Even with recent advancements in multi-object spectroscopy, spectroscopic redshifts will only be possible for a small fraction of sources. While the redshift measurements required don’t need to be measured to excellent resolution and can be roughly binned, they do require a low level of outliers. However, most of the studies planned by the EMU project require redshift estimates. The Evolutionary Map of the Universe (EMU) project is expected to increase the number of known radio galaxies from ∼2.5 million to ∼70 million, allowing for statistical studies of unprecedented size in the radio regime. The next wave of large radio telescopes is being commissioned, with plans to observe deeper, in wider areas than ever before.

0 Comments

Machine learning redshift

Leave a Reply.

Author

Archives

Categories