rail.creation.engines.flowEngine module
This is the subclass of Creator that wraps a PZFlow Flow so that it can be used to generate synthetic data and calculate posteriors.
- class rail.creation.engines.flowEngine.FlowCreator
Bases:
CreatorCreator wrapper for a PZFlow Flow object.
- Parameters:
output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
n_samples ([int] (required)) – Number of samples to create
seed ([int] default=12345) – Random number seed
model (FlowHandle (INPUT))
output (PqHandle (OUTPUT))
- __init__(args, **kwargs)
Constructor
Does standard Creator initialization and also gets the Flow object
- entrypoint_function: str | None = 'sample'
- inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]
- interactive_function: str | None = 'flow_creator'
- name = 'FlowCreator'
- outputs = [('output', <class 'rail.core.data.PqHandle'>)]
- run()
Run method
Calls Flow.sample to use the Flow object to generate photometric data
Notes
Puts the data into the data store under this stages ‘output’ tag
- class rail.creation.engines.flowEngine.FlowModeler
Bases:
ModelerModeler wrapper for a PZFlow Flow object.
This class trains the flow.
- Parameters:
output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([int] default=0) – The random seed for training.
phys_cols ([dict] default={'redshift': [0, 3]}) – Names of non-photometry columns and their corresponding [min, max] values.
phot_cols ([dict] default={'mag_u_lsst': [17, 35], 'mag_g_lsst': [16, 32], 'mag_r_lsst': [15, 30], 'mag_i_lsst': [15, 30], 'mag_z_lsst': [14, 29], 'mag_y_lsst': [14, 28]}) – Names of photometry columns and their corresponding [min, max] values.
calc_colors ([dict] default={'ref_column_name': 'mag_i_lsst'}) – Whether to internally calculate colors (if phot_cols are magnitudes). Assumes that you want to calculate colors from adjacent columns in phot_cols. If you do not want to calculate colors, set False. Else, provide a dictionary {‘ref_column_name’: band}, where band is a string corresponding to the column in phot_cols you want to save as the overall galaxy magnitude.
spline_knots ([int] default=16) – The number of spline knots in the normalizing flow.
n_training_epochs ([int] default=30) – The number of training epochs.
input (TableHandle (INPUT))
model (FlowHandle (OUTPUT))
- __init__(args, **kwargs)
Constructor
Does standard Modeler initialization.
- entrypoint_function: str | None = 'fit_model'
- inputs = [('input', <class 'rail.core.data.TableHandle'>)]
- interactive_function: str | None = 'flow_modeler'
- name = 'FlowModeler'
- outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]
- run()
Run method
Calls Flow.train to train a normalizing flow using PZFlow.
Notes
Puts the data into the data store under this stages ‘output’ tag
- validate()
Check that the inputs actually have the data needed for execution, This is called before the run method. It is an optional stage, meant for checking that the input to the stage is actual in the form and shape needed before an expensive run is executed.
- class rail.creation.engines.flowEngine.FlowPosterior
Bases:
PosteriorCalculatorPosteriorCalculator wrapper for a PZFlow Flow object
data : pd.DataFrame Pandas dataframe of the data on which the posteriors are conditioned. Must have all columns in self.flow.data_columns, *except* for the column specified for the posterior (see below). column : str Name of the column for which the posterior is calculated. Must be one of the columns in self.flow.data_columns. However, whether or not this column is present in `data` is irrelevant. grid : np.ndarray Grid over which the posterior is calculated. err_samples : int, optional Number of samples from the error distribution to average over for the posterior calculation. If provided, Gaussian errors are assumed, and method will look for error columns in `inputs`. Error columns must end in `_err`. E.g. the error column for the variable `u` must be `u_err`. Zero error assumed for any missing error columns. seed: int, optional Random seed for drawing samples from the error distribution. marg_rules : dict, optional Dictionary with rules for marginalizing over missing variables. The dictionary must contain the key "flag", which gives the flag that indicates a missing value. E.g. if missing values are given the value 99, the dictionary should contain {"flag": 99}. The dictionary must also contain {"name": callable} for any variables that will need to be marginalized over, where name is the name of the variable, and callable is a callable that takes the row of variables and returns a grid over which to marginalize the variable. E.g. {"y": lambda row: np.linspace(0, row["x"], 10)}. Note: the callable for a given name must *always* return an array of the same length, regardless of the input row. DEFAULT: the default marg_rules dict is {"flag": np.nan, "u": np.linspace(25, 31, 10),} batch_size: int, default=None Size of batches in which to calculate posteriors. If None, all posteriors are calculated simultaneously. This is faster, but requires more memory. nan_to_zero : bool, default=True Whether to convert NaN's to zero probability in the final pdfs.- Parameters:
output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
column ([str] (required)) – Column to compute posterior for
grid ([list] default=[]) – Grid over which the posterior is calculated
err_samples ([int] default=10) – A parameter
seed ([int] default=12345) – A parameter
marg_rules ([dict] default={'flag': nan, 'mag_u_lsst': <function FlowPosterior.<lambda> at 0x78d747342da0>}) – A parameter
batch_size (int] (default=10000))
nan_to_zero (bool] (default=True))
model (FlowHandle (INPUT))
input (PqHandle (INPUT))
output (QPHandle (OUTPUT))
- __init__(args, **kwargs)
Constructor
Does standard PosteriorCalculator initialization
- entrypoint_function: str | None = 'get_posterior'
- inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.PqHandle'>)]
- interactive_function: str | None = 'flow_posterior'
- name = 'FlowPosterior'
- outputs = [('output', <class 'rail.core.data.QPHandle'>)]
- run()
Run method
Calls Flow.posterior to use the Flow object to get the posterior distribution.
Notes
Get the input data from the data store under this stages ‘input’ tag Puts the data into the data store under this stages ‘output’ tag