rail.estimation.algos.pzflow_nf module

first pass implementation of pzflow estimator First pass will ignore photometric errors and just do things in terms of magnitudes, we will expand in a future update

class rail.estimation.algos.pzflow_nf.PZFlowEstimator

Bases: CatEstimator

CatEstimator which uses PZFlow

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col (str] (default=redshift))
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
seed ([int] default=0) – seed for flow
ref_band (str] (default=mag_i_lsst))
column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)
err_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns
n_error_samples ([int] default=1000) – umber of error samples in marginalization
model (FlowHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Initialize Estimator

entrypoint_function: str | None = 'estimate'

inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'pz_flow_estimator'

name = 'PZFlowEstimator'

class rail.estimation.algos.pzflow_nf.PZFlowInformer

Bases: CatInformer

Subclass to train a pzflow-based estimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
seed ([int] default=0) – seed for flow
ref_band (str] (default=mag_i_lsst))
column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)
err_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns
n_error_samples ([int] default=1000) – umber of error samples in marginalization
soft_sharpness ([int] default=10) – sharpening paremeter for SoftPlus
soft_idx_col ([int] default=0) – index column for SoftPlus
redshift_col (str] (default=redshift))
n_training_epochs ([int] default=50) – number flow training epochs
input (TableHandle (INPUT))
model (FlowHandle (OUTPUT))

__init__(args, **kwargs): Constructor, build the CatInformer, then do PZFlow specific setup

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'pz_flow_informer'

name = 'PZFlowInformer'

outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]

run(): train a flow based on the training data This is mostly based off of the pzflow example notebook

rail.estimation.algos.pzflow_nf.computemeanstd(df)

Compute colors from the magnitudes and compute their means and stddevs for data whitening

Parameters:: df (pandas dataframe) – ordered dict of raw input data
Returns:: means, stds – means and stddevs for the mags and colors
Return type:: numpy arrays