rail.estimation.algos.dnf module
Implementation of the DNF algorithm
DNF (Directional Neighbourhood Fitting) is a nearest-neighbor approach for photometric redshift estimation developed at the CIEMAT (Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas) at Madrid. DNF computes the photo-z hyperplane that best fits the directional neighbourhood of a photometric galaxy in the training sample.
See https://academic.oup.com/mnras/article/459/3/3078/2595234 for more details.
- class rail.estimation.algos.dnf.DNFEstimator
Bases:
CatEstimatorA class for estimating photometric redshifts using the DNF method.
This class extends CatEstimator and predicts redshifts based on photometric. It supports multiple selection modes for redshift estimation, processes missing data, and generates probability density functions (PDFs) for photometric redshifts.
Metrics (selection_mode): - ENF (1): Euclidean neighbourhood. It’s a common distance metric used in kNN (k-Nearest Neighbors) for photometric redshift prediction. - ANF (2): uses normalized inner product for more accurate photo-z predictions. It is particularly recommended when working with datasets containing more than four filters. - DNF (3): combines Euclidean and angular metrics, improving accuracy, especially for larger neighborhoods, and maintaining proportionality in observable content.
- Parameters:
output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col (str] (default=redshift))
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
selection_mode ([int] default=1) – select which mode to choose the redshift estimate:0: ENF, 1: ANF, 2: DNF
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))
- __init__(args, **kwargs)
Constructor: Do Estimator specific initialization
- entrypoint_function: str | None = 'estimate'
- interactive_function: str | None = 'dnf_estimator'
- name = 'DNFEstimator'
- open_model(**kwargs)
Load the mode and/or attach it to this Stage
- Parameters:
tag – Input tag associated to the model
**kwargs – Should include ‘model’, see notes
Notes
The keyword arguement ‘model’ should be either
an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.
- Returns:
The object encapsulating the trained model.
- Return type:
Any
- class rail.estimation.algos.dnf.DNFInformer
Bases:
CatInformerA class for photometric redshift estimation.
This class extends CatInformer and processes photometric data to train for estimating redshifts. It handles missing data by replacing non-detections with predefined magnitude limits and assigns errors accordingly.
- Parameters:
output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname (str] (default=photometry))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
redshift_col (str] (default=redshift))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
nondetect_val (float] (default=99.0))
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))
- __init__(args, **kwargs)
Constructor Do CatInformer specific initialization, then check on bands
- entrypoint_function: str | None = 'inform'
- interactive_function: str | None = 'dnf_informer'
- name = 'DNFInformer'
- run()
Run the stage and return the execution status.
Subclasses must implemented this method.
- rail.estimation.algos.dnf.compute_angular_distance(V, Ts, Tsnorm)
Compute distances based on angular (ANF) metric.
- rail.estimation.algos.dnf.compute_directional_distance(V, Ts, Tsnorm)
Compute distances based on directional (DNF) metric.
- rail.estimation.algos.dnf.compute_euclidean_distance(V, Ts)
Compute distances based on Euclidean metric.
- rail.estimation.algos.dnf.compute_pdfs(zpdf, wpdf, pdf, Nvalid, zgrid)
Compute the PDFs from neighbor redshifts and weights
Parameters: - zpdf: (Nvalid, Nneighbors) array with redshift values of neighbors. - wpdf: (Nvalid, Nneighbors) array with corresponding weights. - pdf: bool, if True, compute PDFs. - Nvalid: int, number of galaxies. - zgrid: (Nz,) array, redshift grid.
Returns: - Vpdf: (Nvalid, Nz) array with probability distributions.
- rail.estimation.algos.dnf.compute_pdfs_fit(photoz, photozerr, zgrid)
Computed the gaussian PDFs for the objects
Parameters: - photoz : z mean values - photozzerr : zerr values - zgrid: grid
Return:
pdfs : np.ndarray
- rail.estimation.algos.dnf.compute_photoz_fit(NEIGHBORS, V, Verr, T, z, fit, photoz, photozerr, photozerr_param, photozerr_fit, pdf, zgrid)
Compute the photometric redshift fit by iteratively removing outliers.
- rail.estimation.algos.dnf.compute_photoz_mean_routliers(NEIGHBORS, Verr, pdf, Nvalid, zgrid)
Compute the mean photometric redshift removing outliers
- rail.estimation.algos.dnf.dnf_photometric_redshift(T, Terr, z, clf, Tnorm, V, Verr, zgrid, metric='ANF', fit=True, pdf=True, Nneighbors=80, presel=500)
Compute the photometric redshifts for the validation or science sample.
- Returns:
- photoz (Estimated photometric redshift.)
- photozerr (Error on the photometric redshift.)
- photozerr_param (Redshift error due to parameters.)
- photozerr_fit (Redshift error due to fit.)
- z1 (Closest redshift estimate.)
- nneighbors (Number of neighbors considered.)
- de1 (Distances Euclidea to the closest neighbor.)
- d1 (Distances to the closest neighbor.)
- id1 (Index of the closest neighbor.)
- C (Additional computed parameters.)
- zpdf (Matrix containing the redshifts of neighboring galaxies.)
- wpdf (Matrix of weights corresponding to the neighboring redshifts.)
- Vpdf (Probability Density Functions (PDFs) for the photometric redshifts of the validation set.)
- rail.estimation.algos.dnf.manage_nan(V, Verr)
Change NaNs by 0 in V and Verr to use only proper measurements
- rail.estimation.algos.dnf.metric_computation(V, NEIGHBORS, Ts, Tsnorm, metric, Nneighbors)
Compute distances based on the selected metric, sort neighbors, and store the closest neighbors.
- rail.estimation.algos.dnf.preselection(V, Verr, Nneighbors, presel, T, clf, Tnorm, z)
Perform the preselection process for photometric redshift estimation.
- rail.estimation.algos.dnf.validate_columns(V, T)
Validates that the columns of T and V have the same names.
- Parameters:
T (np.ndarray) – Training data.
V (np.ndarray) – Validation data.
- Raises:
ValueError – If the column names of T and V do not match.