rail.estimation.algos.pzflow_nf module

first pass implementation of pzflow estimator First pass will ignore photometric errors and just do things in terms of magnitudes, we will expand in a future update

class rail.estimation.algos.pzflow_nf.PZFlowEstimator

Bases: CatEstimator

CatEstimator which uses PZFlow

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col (str] (default=redshift))

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • seed ([int] default=0) – seed for flow

  • ref_band (str] (default=mag_i_lsst))

  • column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)

  • err_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns

  • n_error_samples ([int] default=1000) – umber of error samples in marginalization

  • model (FlowHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

entrypoint_function: str | None = 'estimate'
inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'pz_flow_estimator'
name = 'PZFlowEstimator'
class rail.estimation.algos.pzflow_nf.PZFlowInformer

Bases: CatInformer

Subclass to train a pzflow-based estimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • seed ([int] default=0) – seed for flow

  • ref_band (str] (default=mag_i_lsst))

  • column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)

  • err_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns

  • n_error_samples ([int] default=1000) – umber of error samples in marginalization

  • soft_sharpness ([int] default=10) – sharpening paremeter for SoftPlus

  • soft_idx_col ([int] default=0) – index column for SoftPlus

  • redshift_col (str] (default=redshift))

  • n_training_epochs ([int] default=50) – number flow training epochs

  • input (TableHandle (INPUT))

  • model (FlowHandle (OUTPUT))

__init__(args, **kwargs)

Constructor, build the CatInformer, then do PZFlow specific setup

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'pz_flow_informer'
name = 'PZFlowInformer'
outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]
run()

train a flow based on the training data This is mostly based off of the pzflow example notebook

rail.estimation.algos.pzflow_nf.computemeanstd(df)

Compute colors from the magnitudes and compute their means and stddevs for data whitening

Parameters:

df (pandas dataframe) – ordered dict of raw input data

Returns:

means, stds – means and stddevs for the mags and colors

Return type:

numpy arrays