rail.estimation.algos.cmnn module
Implementation of the color-matched nearest neighbor (CMNN) algorithm See https://ui.adsabs.harvard.edu/abs/2018AJ….155….1G/abstract for more details
- class rail.estimation.algos.cmnn.CMNNEstimator
Bases:
CatEstimatorColor Matched Nearest Neighbor Estimator Note that there are several modifications from the original CMNN, mainly that the original estimator dropped non-detections from the Mahalnobis distance calculation. However, there is information in a non-detection, so instead here I’ve replaced the non-detections with 1 sigma limit and a magnitude uncertainty of 1.0 and fixed the degrees of freedom to be the number of magnitude bands minus one.
Current implementation returns a single Gaussian for each galaxy with a width determined by the std deviation of all galaxies within the range set by the ppf value.
There are three options for how to choose the central value of the Gaussian and that option is set using the selection_mode config parameter (integer): option 0: randomly choose one of the neighbors within the PPF cutoff option 1: choose the value with the smallest Mahalnobis distance option 2: random choice as in option 0, but weighted by distance
If a test galaxy does not have enough training galaxies it is assigned a redshift bad_redshift_val and a width bad_redshift_err, both of which are config parameters that can be set by the user. Note that this should only happen if the number of training galaxies is smaller than min_n, which is unlikely, but is included here for completeness.
- Parameters:
output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col (str] (default=redshift))
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
seed ([int] default=66) – random seed used in selection mode
ppf_value ([float] default=0.68) – PPF value used in Mahalanobis distance
selection_mode ([int] default=1) – select which mode to choose the redshift estimate:0: randomly choose, 1: nearest neigh, 2: weighted random
min_n ([int] default=25) – minimum number of training galaxies to use
min_thresh ([float] default=0.0001) – minimum threshold cutoff
min_dist ([float] default=0.0001) – minimum Mahalanobis distance
bad_redshift_val ([float] default=99.0) – redshift to assign bad redshifts
bad_redshift_err ([float] default=10.0) – Gauss error width to assign to bad redshifts
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))
- __init__(args, **kwargs)
Constructor: Do Estimator specific initialization
- entrypoint_function: str | None = 'estimate'
- interactive_function: str | None = 'cmnn_estimator'
- name = 'CMNNEstimator'
- open_model(**kwargs)
Load the mode and/or attach it to this Stage
- Parameters:
tag – Input tag associated to the model
**kwargs – Should include ‘model’, see notes
Notes
The keyword arguement ‘model’ should be either
an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.
- Returns:
The object encapsulating the trained model.
- Return type:
Any
- class rail.estimation.algos.cmnn.CMNNInformer
Bases:
CatInformercompute colors and color errors for CMNN training set and store in a model file that will be used by the CMNNEstimator stage
- Parameters:
output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
redshift_col (str] (default=redshift))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
nondetect_val (float] (default=99.0))
nondetect_replace ([bool] default=False) – set to True to replace non-detects, False to ignore in distance calculation
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))
- __init__(args, **kwargs)
Constructor Do CatInformer specific initialization, then check on bands
- entrypoint_function: str | None = 'inform'
- interactive_function: str | None = 'cmnn_informer'
- name = 'CMNNInformer'
- run()
Run the stage and return the execution status.
Subclasses must implemented this method.