rail.estimation.algos.pzflow_nf module

first pass implementation of pzflow estimator First pass will ignore photometric errors and just do things in terms of magnitudes, we will expand in a future update

class rail.estimation.algos.pzflow_nf.PZFlowEstimator

Bases: CatEstimator

CatEstimator which uses PZFlow

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – The minimum redshift of the z grid

  • zmax ([float] default=3.0) – The maximum redshift of the z grid

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col ([str] default=redshift) – name of redshift column

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • flow_seed ([int] default=0) – seed for flow

  • ref_column_name ([str] default=mag_i_lsst) – name for reference column

  • column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow

  • mag_limits ([dict] default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}) – 1 sigma mag limits

  • include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)

  • error_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns

  • n_error_samples ([int] default=1000) – umber of error samples in marginalization

  • redshift_column_name ([str] default=redshift) – name of redshift column

  • model (FlowHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

entrypoint_function: str | None = 'estimate'
inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'pz_flow_estimator'
name = 'PZFlowEstimator'
class rail.estimation.algos.pzflow_nf.PZFlowInformer

Bases: CatInformer

Subclass to train a pzflow-based estimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – min z

  • zmax ([float] default=3.0) – max_z

  • nzbins ([int] default=301) – num z bins

  • flow_seed ([int] default=0) – seed for flow

  • ref_column_name ([str] default=mag_i_lsst) – name for reference column

  • column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow

  • mag_limits ([dict] default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}) – 1 sigma mag limits

  • include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)

  • error_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns

  • n_error_samples ([int] default=1000) – umber of error samples in marginalization

  • soft_sharpness ([int] default=10) – sharpening paremeter for SoftPlus

  • soft_idx_col ([int] default=0) – index column for SoftPlus

  • redshift_column_name ([str] default=redshift) – name of redshift column

  • num_training_epochs ([int] default=50) – number flow training epochs

  • input (TableHandle (INPUT))

  • model (FlowHandle (OUTPUT))

__init__(args, **kwargs)

Constructor, build the CatInformer, then do PZFlow specific setup

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'pz_flow_informer'
name = 'PZFlowInformer'
outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]
run()

train a flow based on the training data This is mostly based off of the pzflow example notebook

rail.estimation.algos.pzflow_nf.computemeanstd(df)

Compute colors from the magnitudes and compute their means and stddevs for data whitening

Parameters:

df (pandas dataframe) – ordered dict of raw input data

Returns:

means, stds – means and stddevs for the mags and colors

Return type:

numpy arrays