rail.estimation.algos.pzflow_nf module

first pass implementation of pzflow estimator First pass will ignore photometric errors and just do things in terms of magnitudes, we will expand in a future update

class rail.estimation.algos.pzflow_nf.PZFlowEstimator

Bases: CatEstimator

CatEstimator which uses PZFlow

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid
zmax ([float] default=3.0) – The maximum redshift of the z grid
nzbins ([int] default=301) – The number of gridpoints in the z grid
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
flow_seed ([int] default=0) – seed for flow
ref_column_name ([str] default=mag_i_lsst) – name for reference column
column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow
mag_limits ([dict] default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}) – 1 sigma mag limits
include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)
error_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns
n_error_samples ([int] default=1000) – umber of error samples in marginalization
redshift_column_name ([str] default=redshift) – name of redshift column
model (FlowHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Initialize Estimator

entrypoint_function: str | None = 'estimate'

inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'pz_flow_estimator'

name = 'PZFlowEstimator'

class rail.estimation.algos.pzflow_nf.PZFlowInformer

Bases: CatInformer

Subclass to train a pzflow-based estimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – min z
zmax ([float] default=3.0) – max_z
nzbins ([int] default=301) – num z bins
flow_seed ([int] default=0) – seed for flow
ref_column_name ([str] default=mag_i_lsst) – name for reference column
column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow
mag_limits ([dict] default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}) – 1 sigma mag limits
include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)
error_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns
n_error_samples ([int] default=1000) – umber of error samples in marginalization
soft_sharpness ([int] default=10) – sharpening paremeter for SoftPlus
soft_idx_col ([int] default=0) – index column for SoftPlus
redshift_column_name ([str] default=redshift) – name of redshift column
num_training_epochs ([int] default=50) – number flow training epochs
input (TableHandle (INPUT))
model (FlowHandle (OUTPUT))

__init__(args, **kwargs): Constructor, build the CatInformer, then do PZFlow specific setup

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'pz_flow_informer'

name = 'PZFlowInformer'

outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]

run(): train a flow based on the training data This is mostly based off of the pzflow example notebook

rail.estimation.algos.pzflow_nf.computemeanstd(df)

Compute colors from the magnitudes and compute their means and stddevs for data whitening

Parameters:: df (pandas dataframe) – ordered dict of raw input data
Returns:: means, stds – means and stddevs for the mags and colors
Return type:: numpy arrays