Schematic Diagram showing
									the basic inner working of GaMPEN. It takes an input image to first crop
									it and then estimate posterior distributions of different morphological 

Galaxy Morphology Posterior Estimation Network

The first ML framework for automatic cropping & estimating posteriors of different galaxy morphological parameters

What does GaMPEN do & Why was it needed?

Although Convolutional Neural Networks (CNNs) have used for galaxy morphology determination for quite some time now, a few challenges associated with the process has persisted. From the early attempts at using a CNN to classify galaxies morphologically to the largest ML-produced morphological catalogs currently available, most CNNs have provided broad morphological classifications. There has been very limited work on estimating morphological parameters using CNNs; and there has been no work on estimating robust uncertainties of CNN determined morphological parameters. Even popular non-ML tools like Galfit severely underestimate uncertainties by values as high as $\sim75\%$. Meanwhile, the computation of full Bayesian posteriors for different morphological parameters is crucial for drawing scientific inferences that account for uncertainty and thus are indispensable in the derivation of robust scaling relations or tests of theoretical models using morphology.

One other challenge of using CNNs in astronomy, is the necessity ot use fixed cutout sizes. Most previous work has resorted to selecting a large cutout size for which "most galaxies" would remain in the frame. However, this means that for many galaxies in the dataset, especially smaller ones, typical cutouts contain other galaxies in the frame, often leading to less accurate results. Thus, this becomes a problem when applying CNNs to galaxies over a wide magnitude and redshift range.

In order to address both these challenges, we developed GaMPEN.

  • GaMPEN automatically crops input images to an optimal size before determining their morphology.
  • GaMPEN estimates Bayesian posterior distributions for a galaxy's bulge-to-total light ratio ($L_B/L_T$), radius($R_e$), and flux. (It can be easily adapted to predict other parameters).
  • GaMPEN takes into account both aleatoric & epistemic uncertainties
  • GaMPEN's produced uncertainties/posteriors are extremely well-calibrated and accurate. ($\lesssim 5\%$ derivation)
  • GaMPEN incorporates the full covariance matrix in its loss function allowing it to achieve well-calibrated uncertainties for all three variables simultaneously.
GaMPEN is the first machine learning framework for determining joint posterior distributions of multiple morphological parameters and is also the first application of an STN to optical imaging in astronomy.

GaMPEN's Architecture

Image showing
									a schematic diagram of GaMPEN's architecture GaMPEN consists of an upstream Spatial Transformer Network (STN) followed by a downstream Convolutional Neural Network (CNN) GaMPEN's architecture consists of two separate entities:- a) an upstream Spatial Transformer Network (STN) which enables GaMPEN to automatically crop galaxies to an optimal size; b) a downstream Convolutional Neural Network (CNN) which enables GaMPEN to predict posterior distributions for various morphological parameters.

GaMPEN's design is based on our previously successful classification CNN, GaMorNet, as well as as different variants of the Oxford Visual Geometry Group networks. We tried a variety of different architectures before finally converging on this design.

How does GaMPEN automatically crop galaxies?

Image showing
									the STN crops applied to six randomly chosen simulated galaxies Examples of the transformation applied by the STN to six randomly selected simulated galaxy images.

In GaMPEN, the STN is upstream of the CNN, where it applies a two-dimensional affine transformation to the input image. Each input image is transformed differently by the STN, which learns the appropriate cropping during the training of the downstream CNN without additional supervision.

To perform the transformation the STN predicts the six-parameter matrix of the affine transformation $A_{\theta}$ to be applied to the input image. $$ \left(\begin{array}{c} x_{i}^{s} \\ y_{i}^{s} \end{array}\right) = A_{\theta}\left(\begin{array}{c} x_{i}^{t} \\ y_{i}^{t} \\ 1 \end{array}\right) = \left[\begin{array}{lll} \theta_{11} & \theta_{12} & \theta_{13} \\ \theta_{21} & \theta_{22} & \theta_{23} \end{array}\right]\left(\begin{array}{c} x_{i}^{t} \\ y_{i}^{t} \\ 1 \end{array}\right) $$ where $ \left(x_i^s,y_i^s\right)$ and $ \left( x_i^t, y_i^t \right) $ are the source and target co-ordinates respectively. As the network trains, the predicted transformation alters slowly based on the loss function.

Image showing the performance of the STN on three galaxies 
										from the Hyper Suprime-Cam Wide Survey. Examples of the transformation applied by the STN to three randomly selected $z < 0.25 $ galaxies in the Hyper Suprime-Cam Wide Survey Placing the STN upstream in the GaMPEN framework allows the network to learn how to actively transform the input image in such a way that minimizes the overall loss function (i.e., the predicted transformation is guided by and helps the downstream morphological parameter estimation). Because the transformation we use is differentiable with respect to the parameters, gradients can be backpropagated to the predicted parameters $A_{\theta}$. This crucial property allows the STN to be trained using standard backpropagation along with the downstream CNN, without any additional supervision.

The primary benefit of using an STN is that:- without any additional training needed, the STN learns to systematically crop out secondary galaxies in the cutouts and focus on the galaxy of interest at the center of the cutout.

Prediction Stability Against Rotational Transformations

Video showing how stable predictions by our neural network
									GaMPEN are against rotations. This video is optimized for fast loading. For a higher resolution version, watch it on YouTube

Although CNNs learn to recognize features that are invariant under translation -- the learned features are typically not rotationally invariant. However, this is a problem if CNNs are to be used in astronomy -- especially, for determining the morphology of galaxies. A CNN should be able to identify the same galaxy at two different orientations and return the same values. But is this true? To what level are the predictions stable?

The above video shows the stability of predictions by GaMPEN when an input galaxy image is rotated through various angles. GaMPEN's predictions of all three output parameters -- bulge-to-total light ratio ($L_B/L_T$), effective radius ($R_e$), and flux -- are stable against rotations.The modes of the predicted values deviate by $\lesssim 5\%$. This importantly shows to what level the predictions are stable against rotations.

Posterior Estimation/Uncertainty Prediction

Diagram showing how uncertainties are predicted by GaMPEN Workflow diagram demonstrating how GaMPEN estimates posterior distributions To create a Bayesian framework while predicting morphological parameters, we have to treat the model itself as a random variable---or more precisely, the weights of our network $\boldsymbol{\omega}$, must be probabilistic distributions instead of single deterministic values. For a network with weights, $\boldsymbol{\omega}$, and a training dataset, $\mathcal{D}$, of size $N$ with input images $\left\{\boldsymbol{X}_{1}, \ldots, \boldsymbol{X}_{N}\right\}$ and output parameters $\left\{\boldsymbol{Y}_{1}, \ldots, \boldsymbol{Y}_{N}\right\}$, the posterior of the network weights, $p(\boldsymbol{\omega} \mid \mathcal{D}) $ represents the plausible network parameters. To predict the probability distribution of the output variable $\boldsymbol{\hat{Y}}$ given a new test image $\boldsymbol{\hat{X}}$, we need to marginalize over all possible weights $\boldsymbol{\omega}$: $$ p(\boldsymbol{\hat{Y}} \mid \boldsymbol{\hat{X}}, \mathcal{D})=\int p(\boldsymbol{\hat{Y}} \mid \boldsymbol{\hat{X}}, \boldsymbol{\omega}) p(\boldsymbol{\omega} \mid \mathcal{D}) d \boldsymbol{\omega} . $$ Using variational inference and dropout, we can approximate the integral in as $$ \int p(\boldsymbol{\hat{Y}} \mid \boldsymbol{\hat{X}}, \boldsymbol{\omega}) p(\boldsymbol{\omega} \mid \mathcal{D}) d \boldsymbol{\omega} \approx \frac{1}{T} \sum_{t=1}^{T} p(\boldsymbol{\hat{Y}} \mid \boldsymbol{\hat{X}}, \boldsymbol{\omega}_t) , $$ wherein we perform $T$ forward passes with dropout enabled and $\boldsymbol{\omega}_t$ is the set of weights during the $t^{th}$ forward pass. This procedure is what is referred to as Monte Carlo Dropout. However, only performing Monte Carlo dropout is not enough!

Our training data consists of noisy input images by design, but we know the corresponding morphological parameters with perfect accuracy. However, due to the different amounts of noise in each image, the predictions of GaMPEN at test time should have different levels of uncertainties. Thus, in this situation, we train GaPEN to predict aleatoric uncertainties.

Although we would like to use GaMPEN to predict aleatoric uncertainties, the covariance matrix, $\boldsymbol{\Sigma}$, is not known {\it a priori}. Instead, we train GaMPEN to learn these values by minimizing the negative log-likelihood of the output parameters for the training set, which can be written as $$ \begin{split} - \log \mathcal{L}_{VI} \propto \sum_{n} & \frac{1}{2}\left[\boldsymbol{Y}_{n}-\boldsymbol{\hat{\mu}}_{n}\right]^{\top} \boldsymbol{\hat{\Sigma_n}}^{-1}\left[\boldsymbol{Y}_{n}-\boldsymbol{\hat{\mu}}_{n}\right] \\ & + \frac{1}{2} \log [\operatorname{det}(\boldsymbol{\hat{\Sigma_n}})] + \lambda \sum_{i}\left\|\boldsymbol{\omega_{i}}\right\|^{2} . \end{split} $$ where $\boldsymbol{\hat{\mu}}_n$ and $\boldsymbol{\hat{\Sigma}}_n$ are the mean and covariance matrix of the multivariate Gaussian distribution predicted by GaMPEN for an image, $\boldsymbol{X}_n$. $\lambda$ is the strength of the regularization term, and $\boldsymbol{\omega}_i$ are sampled from $q(\boldsymbol{\omega})$. Note that the above function contains the inverse and determinant of the covariance matrix -- calculating can be numerically unstable. Refer to the GaMPEN paper on how to navigate this.

Thus to predict uncertainties:-

  • For every image, GaMPEN predicts the parameters of a multivariate Gaussian distribution ($\boldsymbol{\mu}$,$\boldsymbol{\Sigma}$). We then draw a sample from this distribution.
  • Now the network is slightly altered using Monte Carlo Dropout, and the above step is re-performed with a slightly different estimate of ($\boldsymbol{\mu}$,$\boldsymbol{\Sigma}$).
  • The last step is now repeated 1000 times for each galaxy

The combination of the above two steps allows us to estimate robust uncertainties. By testing GaMPEN on simulated Hyper Suprime-Cam galaxies, we have verified that GaMPEN's predicted posteriors are well-calibrated and accurate. ($\lesssim 5\%$ derivation).

Data & Code Release

GaMPEN's source code and documentation can be accessed on GitHub.

The HSC morphological parameter catalogs are accessible at this link