rail.estimation.algos.dnf module

Implementation of the DNF algorithm

DNF (Directional Neighbourhood Fitting) is a nearest-neighbor approach for photometric redshift estimation developed at the CIEMAT (Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas) at Madrid. DNF computes the photo-z hyperplane that best fits the directional neighbourhood of a photometric galaxy in the training sample.

See https://academic.oup.com/mnras/article/459/3/3078/2595234 for more details.

class rail.estimation.algos.dnf.DNFEstimator

Bases: CatEstimator

A class for estimating photometric redshifts using the DNF method.

This class extends CatEstimator and predicts redshifts based on photometric. It supports multiple selection modes for redshift estimation, processes missing data, and generates probability density functions (PDFs) for photometric redshifts.

Metrics (selection_mode): - ENF (1): Euclidean neighbourhood. It’s a common distance metric used in kNN (k-Nearest Neighbors) for photometric redshift prediction. - ANF (2): uses normalized inner product for more accurate photo-z predictions. It is particularly recommended when working with datasets containing more than four filters. - DNF (3): combines Euclidean and angular metrics, improving accuracy, especially for larger neighborhoods, and maintaining proportionality in observable content.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col (str] (default=redshift))

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • selection_mode ([int] default=1) – select which mode to choose the redshift estimate:0: ENF, 1: ANF, 2: DNF

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do Estimator specific initialization

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'dnf_estimator'
name = 'DNFEstimator'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

class rail.estimation.algos.dnf.DNFInformer

Bases: CatInformer

A class for photometric redshift estimation.

This class extends CatInformer and processes photometric data to train for estimating redshifts. It handles missing data by replacing non-detections with predefined magnitude limits and assigns errors accordingly.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname (str] (default=photometry))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • redshift_col (str] (default=redshift))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • nondetect_val (float] (default=99.0))

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor Do CatInformer specific initialization, then check on bands

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'dnf_informer'
name = 'DNFInformer'
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

rail.estimation.algos.dnf.compute_angular_distance(V, Ts, Tsnorm)

Compute distances based on angular (ANF) metric.

rail.estimation.algos.dnf.compute_directional_distance(V, Ts, Tsnorm)

Compute distances based on directional (DNF) metric.

rail.estimation.algos.dnf.compute_euclidean_distance(V, Ts)

Compute distances based on Euclidean metric.

rail.estimation.algos.dnf.compute_pdfs(zpdf, wpdf, pdf, Nvalid, zgrid)

Compute the PDFs from neighbor redshifts and weights

Parameters: - zpdf: (Nvalid, Nneighbors) array with redshift values of neighbors. - wpdf: (Nvalid, Nneighbors) array with corresponding weights. - pdf: bool, if True, compute PDFs. - Nvalid: int, number of galaxies. - zgrid: (Nz,) array, redshift grid.

Returns: - Vpdf: (Nvalid, Nz) array with probability distributions.

rail.estimation.algos.dnf.compute_pdfs_fit(photoz, photozerr, zgrid)

Computed the gaussian PDFs for the objects

Parameters: - photoz : z mean values - photozzerr : zerr values - zgrid: grid

Return:

pdfs : np.ndarray

rail.estimation.algos.dnf.compute_photoz_fit(NEIGHBORS, V, Verr, T, z, fit, photoz, photozerr, photozerr_param, photozerr_fit, pdf, zgrid)

Compute the photometric redshift fit by iteratively removing outliers.

rail.estimation.algos.dnf.compute_photoz_mean_routliers(NEIGHBORS, Verr, pdf, Nvalid, zgrid)

Compute the mean photometric redshift removing outliers

rail.estimation.algos.dnf.dnf_photometric_redshift(T, Terr, z, clf, Tnorm, V, Verr, zgrid, metric='ANF', fit=True, pdf=True, Nneighbors=80, presel=500)

Compute the photometric redshifts for the validation or science sample.

Returns:

  • - photoz (Estimated photometric redshift.)

  • - photozerr (Error on the photometric redshift.)

  • - photozerr_param (Redshift error due to parameters.)

  • - photozerr_fit (Redshift error due to fit.)

  • - z1 (Closest redshift estimate.)

  • - nneighbors (Number of neighbors considered.)

  • - de1 (Distances Euclidea to the closest neighbor.)

  • - d1 (Distances to the closest neighbor.)

  • - id1 (Index of the closest neighbor.)

  • - C (Additional computed parameters.)

  • - zpdf (Matrix containing the redshifts of neighboring galaxies.)

  • - wpdf (Matrix of weights corresponding to the neighboring redshifts.)

  • - Vpdf (Probability Density Functions (PDFs) for the photometric redshifts of the validation set.)

rail.estimation.algos.dnf.manage_nan(V, Verr)

Change NaNs by 0 in V and Verr to use only proper measurements

rail.estimation.algos.dnf.metric_computation(V, NEIGHBORS, Ts, Tsnorm, metric, Nneighbors)

Compute distances based on the selected metric, sort neighbors, and store the closest neighbors.

rail.estimation.algos.dnf.preselection(V, Verr, Nneighbors, presel, T, clf, Tnorm, z)

Perform the preselection process for photometric redshift estimation.

rail.estimation.algos.dnf.validate_columns(V, T)

Validates that the columns of T and V have the same names.

Parameters:
  • T (np.ndarray) – Training data.

  • V (np.ndarray) – Validation data.

Raises:

ValueError – If the column names of T and V do not match.