rail.estimation.algos.flexzboost module

Implementation of the FlexZBoost algorithm, uses training data and XGBoost to learn the relation, split training data into train and validation set and find best “bump_thresh” (eliminate small peaks in p(z) below threshold) and sharpening parameter (determines peakiness of p(z) shape) via cde-loss over a grid.

class rail.estimation.algos.flexzboost.FlexZBoostEstimator

Bases: CatEstimator

FlexZBoost-based CatEstimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col ([str] default=redshift) – name of redshift column

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • qp_representation ([str] default=interp) – qp generator to use. [interp|flexzboost]

  • include_mag_err ([bool] default=False) – Include magnitude error in the training and estimationprocess

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do CatEstimator specific initialization

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'flex_z_boost_estimator'
name = 'FlexZBoostEstimator'
class rail.estimation.algos.flexzboost.FlexZBoostInformer

Bases: CatInformer

Train a FlexZBoost CatInformer

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • redshift_col (str] (default=redshift))

  • retrain_full ([bool] default=True) – if True, re-run the fit with the full training set, including data set aside for bump/sharpen validation. If False, only use the subset defined via trainfrac fraction

  • trainfrac ([float] default=0.75) – fraction of training data to use for training (rest used for bump thresh and sharpening determination)

  • seed ([int] default=1138) – Random number seed

  • bumpmin ([float] default=0.02) – minimum value in grid of thresholds checked to optimize removal of spurious small bumps

  • bumpmax ([float] default=0.35) – max value in grid checked for removal of small bumps

  • nbump ([int] default=20) – number of grid points in bumpthresh grid search

  • sharpmin ([float] default=0.7) – min value in grid checked in optimal sharpening parameter fit

  • sharpmax ([float] default=2.1) – max value in grid checked in optimal sharpening parameter fit

  • nsharp ([int] default=15) – number of search points in sharpening fit

  • max_basis ([int] default=35) – maximum number of basis funcitons to use in density estimate

  • basis_system ([str] default=cosine) – type of basis sytem to use with flexcode

  • regression_params ([dict] default={'max_depth': 8, 'objective': 'reg:squarederror'}) – dictionary of options passed to flexcode, includes max_depth (int), and objective, which should be set to reg:squarederror

  • include_mag_err ([bool] default=False) – Include magnitude error in the training and estimationprocess

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor Do CatInformer specific initialization, then check on bands

divide_array(grid)
entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'flex_z_boost_informer'
name = 'FlexZBoostInformer'
run()

Train flexzboost model model

static split_data(fz_data, sz_data, trainfrac, seed)

make a random partition of the training data into training and validation, validation data will be used to determine bump thresh and sharpen parameters.

rail.estimation.algos.flexzboost.make_color_data(data_dict, bands, err_bands, ref_band, include_mag_err=False)

make a dataset consisting of the i-band mag and the five colors.

Parameters:

data_dict (ndarray) – array of magnitudes and errors, with names mag_{bands[i]}_lsst and mag_err_{bands[i]}_lsst respectively.

Returns:

input_data – array of imag and 5 colors

Return type:

ndarray