rail.creation.engines.flowEngine module

This is the subclass of Creator that wraps a PZFlow Flow so that it can be used to generate synthetic data and calculate posteriors.

class rail.creation.engines.flowEngine.FlowCreator(args, comm=None)[source]

Bases: Creator

Creator wrapper for a PZFlow Flow object.

inputs = [('model', <class 'rail.core.data.FlowHandle'>)]
name = 'FlowCreator'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()[source]

Run method

Calls Flow.sample to use the Flow object to generate photometric data

Notes

Puts the data into the data store under this stages ‘output’ tag

class rail.creation.engines.flowEngine.FlowModeler(args, comm=None)[source]

Bases: Modeler

Modeler wrapper for a PZFlow Flow object.

This class trains the flow.

config_options = {'calc_colors': <ceci.config.StageParameter object>, 'num_training_epochs': <ceci.config.StageParameter object>, 'output_mode': <ceci.config.StageParameter object>, 'phot_cols': <ceci.config.StageParameter object>, 'phys_cols': <ceci.config.StageParameter object>, 'seed': <ceci.config.StageParameter object>, 'spline_knots': <ceci.config.StageParameter object>}
inputs = [('input', <class 'rail.core.data.TableHandle'>)]
name = 'FlowModeler'
outputs = [('model', <class 'rail.core.data.FlowHandle'>)]
run()[source]

Run method

Calls Flow.train to train a normalizing flow using PZFlow.

Notes

Puts the data into the data store under this stages ‘output’ tag

class rail.creation.engines.flowEngine.FlowPosterior(args, comm=None)[source]

Bases: PosteriorCalculator

PosteriorCalculator wrapper for a PZFlow Flow object

Parameters:
  • data (pd.DataFrame) – Pandas dataframe of the data on which the posteriors are conditioned. Must have all columns in self.flow.data_columns, except for the column specified for the posterior (see below).

  • column (str) – Name of the column for which the posterior is calculated. Must be one of the columns in self.flow.data_columns. However, whether or not this column is present in data is irrelevant.

  • grid (np.ndarray) – Grid over which the posterior is calculated.

  • err_samples (int, optional) – Number of samples from the error distribution to average over for the posterior calculation. If provided, Gaussian errors are assumed, and method will look for error columns in inputs. Error columns must end in _err. E.g. the error column for the variable u must be u_err. Zero error assumed for any missing error columns.

  • seed (int, optional) – Random seed for drawing samples from the error distribution.

  • marg_rules (dict, optional) – Dictionary with rules for marginalizing over missing variables. The dictionary must contain the key “flag”, which gives the flag that indicates a missing value. E.g. if missing values are given the value 99, the dictionary should contain {“flag”: 99}. The dictionary must also contain {“name”: callable} for any variables that will need to be marginalized over, where name is the name of the variable, and callable is a callable that takes the row of variables and returns a grid over which to marginalize the variable. E.g. {“y”: lambda row: np.linspace(0, row[“x”], 10)}. Note: the callable for a given name must always return an array of the same length, regardless of the input row. DEFAULT: the default marg_rules dict is {“flag”: np.nan, “u”: np.linspace(25, 31, 10),}

  • batch_size (int, default=None) – Size of batches in which to calculate posteriors. If None, all posteriors are calculated simultaneously. This is faster, but requires more memory.

  • nan_to_zero (bool, default=True) – Whether to convert NaN’s to zero probability in the final pdfs.

config_options = {'batch_size': 10000, 'column': <class 'str'>, 'err_samples': 10, 'grid': <class 'list'>, 'marg_rules': {'flag': nan, 'mag_u_lsst': <function FlowPosterior.<lambda>>}, 'nan_to_zero': True, 'output_mode': <ceci.config.StageParameter object>, 'seed': 12345}
inputs = [('model', <class 'rail.core.data.FlowHandle'>), ('input', <class 'rail.core.data.PqHandle'>)]
name = 'FlowPosterior'
outputs = [('output', <class 'rail.core.data.QPHandle'>)]
run()[source]

Run method

Calls Flow.posterior to use the Flow object to get the posterior distribution.

Notes

Get the input data from the data store under this stages ‘input’ tag Puts the data into the data store under this stages ‘output’ tag