rail.estimation.algos.pzflow_nf module
first pass implementation of pzflow estimator First pass will ignore photometric errors and just do things in terms of magnitudes, we will expand in a future update
- class rail.estimation.algos.pzflow_nf.PZFlowEstimator
Bases:
CatEstimatorCatEstimator which uses PZFlow
- Parameters:
output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col (str] (default=redshift))
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
seed ([int] default=0) – seed for flow
ref_band (str] (default=mag_i_lsst))
column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)
err_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns
n_error_samples ([int] default=1000) – umber of error samples in marginalization
model (FlowHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))
- __init__(args, **kwargs)
Initialize Estimator
- entrypoint_function: str | None = 'estimate'
- inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]
- interactive_function: str | None = 'pz_flow_estimator'
- name = 'PZFlowEstimator'
- class rail.estimation.algos.pzflow_nf.PZFlowInformer
Bases:
CatInformerSubclass to train a pzflow-based estimator
- Parameters:
output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
seed ([int] default=0) – seed for flow
ref_band (str] (default=mag_i_lsst))
column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)
err_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns
n_error_samples ([int] default=1000) – umber of error samples in marginalization
soft_sharpness ([int] default=10) – sharpening paremeter for SoftPlus
soft_idx_col ([int] default=0) – index column for SoftPlus
redshift_col (str] (default=redshift))
n_training_epochs ([int] default=50) – number flow training epochs
input (TableHandle (INPUT))
model (FlowHandle (OUTPUT))
- __init__(args, **kwargs)
Constructor, build the CatInformer, then do PZFlow specific setup
- entrypoint_function: str | None = 'inform'
- interactive_function: str | None = 'pz_flow_informer'
- name = 'PZFlowInformer'
- outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]
- run()
train a flow based on the training data This is mostly based off of the pzflow example notebook
- rail.estimation.algos.pzflow_nf.computemeanstd(df)
Compute colors from the magnitudes and compute their means and stddevs for data whitening
- Parameters:
df (pandas dataframe) – ordered dict of raw input data
- Returns:
means, stds – means and stddevs for the mags and colors
- Return type:
numpy arrays