disco.core.cnn_inference

This module defines the DiscoNet convolutional neural network and the predict_with_cnn() inference function used to generate prior estimates of disk inclination and position angle from a FITS image patch.


Architecture

DiscoNet is a residual convolutional encoder with a multi-layer perceptron head. It accepts a 3-channel \(128 \times 128\) tensor and returns 5 scalar outputs (n_out=5), although only the first three are used for geometric parameter decoding in the current inference path.

class disco.core.cnn_inference.ResBlock(ch)[source]

A residual block consisting of two \(3 \times 3\) convolutional layers with batch normalisation and ReLU activations, plus a skip connection.

\[\text{out} = \text{ReLU}\!\left(x + \text{BN}(\text{Conv}(\text{ReLU}(\text{BN}(\text{Conv}(x)))))\right)\]
Parameters:

ch (int) – Number of input and output channels.

forward(x)[source]
Parameters:

x (torch.Tensor) – Input feature map.

Returns:

Output feature map of the same shape as input.

Return type:

torch.Tensor

class disco.core.cnn_inference.DiscoNet(n_out=6)[source]

Residual convolutional encoder with five downsampling stages and an MLP regression head. At inference time the model is loaded with n_out=5.

Architecture summary:

Stage

Layer(s)

Output channels

stem

Conv 3×3, BN, ReLU

32

enc1

ResBlock(32) + Conv 3×3 stride 2

64

enc2

ResBlock(64) + Conv 3×3 stride 2

128

enc3

ResBlock(128) + Conv 3×3 stride 2

256

enc4

ResBlock(256) + Conv 3×3 stride 2

512

enc5

ResBlock(512) + Conv 3×3 stride 2

512

pool

AdaptiveAvgPool2d(4×4)

512 × 4 × 4 = 8192

head

Linear(8192→1024), ReLU, Dropout(0.45), Linear(1024→512), ReLU, Dropout(0.30), Linear(512→n_out)

n_out

Parameters:

n_out (int) – Number of output scalars. Default: 6. The CLI pipeline loads the model with n_out=5.

forward(x)[source]
Parameters:

x (torch.Tensor) – Input tensor of shape (N, 3, H, W).

Returns:

Output tensor of shape (N, n_out).

Return type:

torch.Tensor


Output Encoding

The network outputs are encoded as follows (as documented in the training checkpoint outputs field):

Index

Symbol

Encoding

0

incl/90

Inclination normalised to [0, 1] (multiply by 90° to recover degrees)

1

sin2PA

\(\sin(2\phi)\) — PA encoded as double-angle sine

2

cos2PA

\(\cos(2\phi)\) — PA encoded as double-angle cosine

3

dx/0.14

Centre x-offset normalised by 0.14 arcsec (not used in inference)

4

dy/0.14

Centre y-offset normalised by 0.14 arcsec (not used in inference)

The position angle is decoded as:

\[\hat{\phi} = \left(\frac{1}{2}\arctan_2\!\left(\hat{y}_1,\, \hat{y}_2\right) \times \frac{180}{\pi}\right) \bmod 180°\]

This double-angle encoding is used to ensure continuity across the \(0° / 180°\) boundary of position angle.


Inference Function

disco.core.cnn_inference.predict_with_cnn(data, header, pixel_scale, cx, cy, search_rad, model)[source]

Generate a prior estimate of disk inclination and position angle from a FITS image patch using a pre-loaded DiscoNet model.

Preprocessing:

  1. A rectangular crop of half-width \(1.5 \times r_{\rm search} / \delta_{\rm pix}\) pixels is extracted around (cx, cy). The crop is zero-padded if it extends beyond the image boundary.

  2. The crop is resampled to \(128 \times 128\) pixels using scipy.ndimage.zoom (order 1).

  3. Intensity normalisation: clipped to \([p_1, p_{99.9}]\) then rescaled to [0, 1].

  4. A beam map (2D elliptical Gaussian at the image centre, normalised to peak 1) is constructed for the second channel.

  5. A scalar map filled with \(\text{clip}(b_{\rm maj} / \text{FOV}, 0, 1)\) is used as the third channel.

The resulting 3-channel tensor of shape (1, 3, 128, 128) is forwarded through the model in eval mode with gradient computation disabled.

Output decoding:

\[ \begin{align}\begin{aligned}\hat{i} = \text{clip}(y_0 \times 90°,\ 0°, 85°)\\\hat{\phi} = \left(\frac{\arctan_2(y_1, y_2)}{2} \times \frac{180}{\pi}\right) \bmod 180°\end{aligned}\end{align} \]
Parameters:
  • data (numpy.ndarray) – 2D FITS image array (float32).

  • header (dict) – FITS header (used for BMAJ, BMIN, BPA).

  • pixel_scale (float) – Pixel scale in arcseconds per pixel.

  • cx (float) – Centroid column coordinate in pixels.

  • cy (float) – Centroid row coordinate in pixels.

  • search_rad (float) – Search radius in arcseconds defining the crop region.

  • model (DiscoNet) – Pre-loaded DiscoNet model in evaluation mode.

Returns:

(cnn_incl, cnn_pa) — estimated inclination (degrees) and position angle (degrees).

Return type:

tuple[float, float]