rail.estimation.algos.k_nearneigh module

quick implementation of k nearest neighbor estimator First pass will ignore photometric errors and just do things in terms of magnitudes, we will expand in a future update

class rail.estimation.algos.k_nearneigh.KNearNeighEstimator

Bases: CatEstimator

KNN-based estimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col (str] (default=redshift))

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do Estimator specific initialization

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'k_near_neigh_estimator'
name = 'KNearNeighEstimator'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

class rail.estimation.algos.k_nearneigh.KNearNeighInformer

Bases: CatInformer

Train a KNN-based estimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname (str] (default=photometry))

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • redshift_col (str] (default=redshift))

  • trainfrac ([float] default=0.75) – fraction of training data used to make tree, rest used to set best sigma

  • seed ([int] default=0) – Random number seed for NN training

  • sigma_grid_min ([float] default=0.01) – minimum value of sigma for grid check

  • sigma_grid_max ([float] default=0.075) – maximum value of sigma for grid check

  • ngrid_sigma ([int] default=10) – number of grid points in sigma check

  • leaf_size ([int] default=15) – min leaf size for KDTree

  • nneigh_min ([int] default=3) – int, min number of near neighbors to use for PDF fit

  • nneigh_max ([int] default=7) – int, max number of near neighbors to use ofr PDF fit

  • only_colors ([bool] default=False) – if only_colors True, then do not use ref_band mag, only use colors

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor Do CatInformer specific initialization, then check on bands

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'k_near_neigh_informer'
name = 'KNearNeighInformer'
run()

train a KDTree on a fraction of the training data