Stress Testing Machine Learning in Astronomy

Stress-Testing Machine Learning in Astronomy

Can we harness machine learning properly to accelerate astronomy for the next decade?

Inset: The "first" JWST color image | Credit: NASA, ESA, CSA, STScI

Image showing a plot of the total number of research articles published
in physics and astronomy during the previous nine years with Machine Learning
in their title or abstract. The growth shown in this figure is almost exponential. Total number of papers in physics and astronomy with machine learning in their title or abstract. Data Source: ADS Over the last decade, machine learning (ML) has been increasingly employed by astronomers for a wide variety of tasks -- from identifying exoplanets to studying galaxies and black holes. Especially, Convolutional Neural Networks (CNNs) have revolutionized the field of image processing and have become increasingly popular for determining galaxy morphology.

However, despite all this work, a few particular challenges have remained, which are crucial to address in order to harness ML fully for the next generation of surveys (like Rubin, Roman, and Euclid). I have focused a part of my research in addressing these challenges:-

How stable are predictions to rotational transformations?
Can we train ML models without gigantic real datasets and still obtain good results on real data?
Can the same algorithms be applied to data over a range of redshifts and form different surveys?
Can we interpret/investigate the decision making process of our algorithms?

Stability Under Rotations
Training with Minimal Real Data
Applicability Across Surveys
AstroML Interpretability

Prediction Stability Against Rotational Transformations

Video showing how stable predictions by our neural network
GaMPEN are against rotations.

This video is optimized for fast loading. For a higher resolution version, watch it on YouTube

Although Convolutional Neural Networks (CNNs) learn to recognize features that are invariant under translation -- the learned features are typically not rotationally invariant. However, this is a problem if CNNs are to be used in astronomy -- especially, for determining the morphology of galaxies. A CNN should be able to identify the same galaxy at two different orientations and return the same values. But is this true? To what level are the predictions stable?

The above video shows the stability of predictions by GaMPEN (Galaxy Morphology Posterior Estimation Network) when an input galaxy from the Hyper Suprime-Cam Wide survey is fed into the framework and slowly rotated. GaMPEN's predictions of all three output parameters -- bulge-to-total light ratio ($L_B/L_T$), effective radius ($R_e$), and flux -- are stable against rotations.The modes of the predicted values deviate by $\lesssim 5\%$. This importantly shows to what level the predictions are stable against rotations.

But what do we do to make this happen? Our approach to do this two fold:-

We have simulated galaxies with the same structural parameters, but different orientations in our training dataset.
We augment the number of real galaxies in our dataset by applying different random rotational transformations on the input data during training.

The above two steps ensure that the network sees enough examples of the same galaxy with different orientations, and this we are able induce rotational invariance into our networks even though inherently they don't possess it.

Training ML Algorithms Without Vast Amounts of Real Data

Illustration showing the chicken egg problem framed as between machine learning
models and their produced catalog. If the ML models need catalogs to train, how are
we going to produce the next generation of catalogs with them. Base Image: Vector Stock One of the challenges in training ML models is the vast amount of data that they need for training. With regards to astronomy, this leads to a specific challenge: if parameter catalogs for the next generation of surveys are to be produced using ML; how can we train these ML algorithms in the absence of produced catalogs from these surveys?

In order to address this, we have used a two-fold strategy:-

Train all the layers in a CNN using semi-realistic simulations.
Fine-tune the entire network or only the last few layers using a small amount of hand-annotated real images.

Since semi-realistic simulations are cheap to make, we are able to perform most of the heavy training workload using large numbers of simulated galaxies. However, a network trained on simply simulations fails to deliver satisfactory results. Thus, we fine-tune the trained model using a small number of galaxies which we analyze manually. Depending on the model and amount of real data being used, we have found it to be prudent to sometimes fine-tune only the latter layers. The logic behind this approach is that, in a CNN, the deeper feature maps identify more complicated features while the earlier layers identify more basic features (like lines, shapes and edges) [see later sections on this page.].

This two step approach has allowed to obtain excellent results for both classification as well as parameter/posterior estimation problems using GaMorNet and GaMPEN. Using this approach we have classified millions of galaxies across SDSS, CANDELS, and HSC. The above approach has allowed us to reduce the amount of real data needed for training by 80-90%

Applicability Across Surveys/Redshfits

Image showing same region of the sky when imaged using SDSS, HSC,
and CANDELS. The same region of the sky imaged by different surveys. Individual images from Melchior et al. 2021 Over the next decade, new extragalactic data will be generated in astronomy, by different telescopes at a wide variety of redshifts, wavelengths, depths, and pixel-scales. Thus, it's a problem if we need to develop a separate model architecture every time our target dataset changes (with the same underlying task.)

One of my pushes has been to develop and test the applicability of models of the same underlying architecture across data of a wide variety of imaging qualities. In our papers we have successfully demonstrated the applicability of our algorithms across datasets of different depths, and pixel scales at different redshfits as outlined below. The two step of approach of first training on simulations and then fine-tuning using real data has enabled us to achieve this.

ML Tool	SDSS	HSC	HST Legacy Fields
GaMorNet	Done	Done	Done
GaMPEN		Done	In Progress

AstroML Interpretability -- Investigating ML decision making

It is often said that deep-learning models are "black-boxes" learning features that are difficult to understand. Although this might be true for certain types of deep-learning models, it's definitely not true for CNNs. The representations that are learned by CNNs are highly amenable to visualization, primarily because they are representations of visual concepts. Since mid 2010s, many techniques have been developed for visualizing and interpreting these representations. We have used these techniques in order to have a better understanding of the decision making process of our algorithms.

Class Activation Mapping

Heatmaps of class activation applied to four different galaxies
showing regions that are heavily used by GaMorNet in its decision making
process Heatmaps of class activation showing regions of images (in red) that are heavily used by GaMorNet in its decision making process. One of the techniques we used to investigate GaMorNet's decision making process is class activation mapping (CAM) shown above. A class activation heatmap is a 2D grid of scores associated with a specific output class, computed for every location in any input image, indicating how important each location is with respect to the class under consideration. As the above image shows, for the galaxies with spiral arms, GaMorNet heavily uses the presence of the spiral arms to infer that these are disk-dominated galaxies. The right two images demonstrate that despite secondary objects being in frame, GaMorNet correctly focuses on the galaxy of interest at the center of the image. This allows us an interesting sneak-peak into how GaMorNet classifies different galaxies.

Visualizing Intermediate Activations

Visualizing the intermediate activation outputs from the
different layers shown for a case when a spiral galaxy is fed
into GaMorNet Intermediate activations from the first two convolutional layers of GaMorNet when a spiral galaxy is fed into it. Another technique we used to investigate GaMorNet's decision making process is to visualize the different intermediate activations that are output by different convolutional layers in the network. This helps us to understand how an input to GaMorNet is decomposed as it propagates through the network to the final output value.

From the image shown above it is clear, that GaMorNet first almost perfectly separates the galaxy from the background and then, after that, focuses on subparts of the galaxy in question.

Visualizing Filter Patterns

Visualizing the filter patterns from the
different layers of a trained GaMorNet framework. Different learned filter patterns at different depths shown for a trained GaMorNet framework. One more way to inspect the filters learned by CNNs is to display the visual pattern that each filter in the convolutional layers is meant to respond to. This can be done with gradient ascent in the input space -- applying gradient descent to the value of the input image of a CNN so as to maximize the response of a specific filter, starting from a blank input image. The resulting input image will be one that the chosen filter is maximally responsive to.

We apply this technique to GaMorNet and the different patters that the filters are most responsive to are shown at different depths within the network (the leftmost set of images is from the shallowest layer; and the rightmost set of images is from the deepest layer). As is clear from these images -- shallower layers in the network detect simple features in the input image such as edges and lines. As the network gets progressively deeper, the network deals with higher level more complicated features. As can be seen from the second-from-the-eight image, after detecting lines edges, GaMorNet starts to focus on elliptical/circular patters -- galaxies! And in the last set of images, it can be seen that GaMorNet is looking for sub-features within galaxies, such as spiral arms! The fact that it's only the deeper layers which learn the most complicated features, is what allows us to achieve good performance by fine-tuning only the last few layers of our CNNs.