rail.estimation.algos.flexzboost module

Implementation of the FlexZBoost algorithm, uses training data and XGBoost to learn the relation, split training data into train and validation set and find best “bump_thresh” (eliminate small peaks in p(z) below threshold) and sharpening parameter (determines peakiness of p(z) shape) via cde-loss over a grid.

class rail.estimation.algos.flexzboost.FlexZBoostEstimator

Bases: CatEstimator

FlexZBoost-based CatEstimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
ref_band (str] (default=mag_i_lsst))
qp_representation ([str] default=interp) – qp generator to use. [interp|flexzboost]
include_mag_err ([bool] default=False) – Include magnitude error in the training and estimationprocess
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do CatEstimator specific initialization

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'flex_z_boost_estimator'

name = 'FlexZBoostEstimator'

class rail.estimation.algos.flexzboost.FlexZBoostInformer

Bases: CatInformer

Train a FlexZBoost CatInformer

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
ref_band (str] (default=mag_i_lsst))
redshift_col (str] (default=redshift))
retrain_full ([bool] default=True) – if True, re-run the fit with the full training set, including data set aside for bump/sharpen validation. If False, only use the subset defined via trainfrac fraction
trainfrac ([float] default=0.75) – fraction of training data to use for training (rest used for bump thresh and sharpening determination)
seed ([int] default=1138) – Random number seed
bumpmin ([float] default=0.02) – minimum value in grid of thresholds checked to optimize removal of spurious small bumps
bumpmax ([float] default=0.35) – max value in grid checked for removal of small bumps
nbump ([int] default=20) – number of grid points in bumpthresh grid search
sharpmin ([float] default=0.7) – min value in grid checked in optimal sharpening parameter fit
sharpmax ([float] default=2.1) – max value in grid checked in optimal sharpening parameter fit
nsharp ([int] default=15) – number of search points in sharpening fit
max_basis ([int] default=35) – maximum number of basis funcitons to use in density estimate
basis_system ([str] default=cosine) – type of basis sytem to use with flexcode
regression_params ([dict] default={'max_depth': 8, 'objective': 'reg:squarederror'}) – dictionary of options passed to flexcode, includes max_depth (int), and objective, which should be set to reg:squarederror
include_mag_err ([bool] default=False) – Include magnitude error in the training and estimationprocess
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Constructor Do CatInformer specific initialization, then check on bands

divide_array(grid)

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'flex_z_boost_informer'

name = 'FlexZBoostInformer'

run(): Train flexzboost model model

static split_data(fz_data, sz_data, trainfrac, seed): make a random partition of the training data into training and validation, validation data will be used to determine bump thresh and sharpen parameters.

rail.estimation.algos.flexzboost.make_color_data(data_dict, bands, err_bands, ref_band, include_mag_err=False)

make a dataset consisting of the i-band mag and the five colors.

Parameters:: data_dict (ndarray) – array of magnitudes and errors, with names mag_{bands[i]}_lsst and mag_err_{bands[i]}_lsst respectively.
Returns:: input_data – array of imag and 5 colors
Return type:: ndarray