rail.stages package

class rail.stages.AddColumnOfRandom

Bases: Noisifier

Add a column of random numbers to a dataframe

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
col_name ([str] default=chaos_bunny) – Name of the column with random numbers
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor

Does standard Noisifier initialization

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'add_column_of_random'

name = 'AddColumnOfRandom'

stage_columns: list[str] | None

class rail.stages.BPZliteEstimator

Bases: CatEstimator

CatEstimator subclass to implement basic marginalized PDF for BPZ In addition to the marginalized redshift PDF, we also compute several ancillary quantities that will be stored in the ensemble ancil data: zmode: mode of the PDF amean: mean of the PDF tb: integer specifying the best-fit SED at the redshift mode todds: fraction of marginalized posterior prob. of best template, so lower numbers mean other templates could be better fits, likely at other redshifts

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col (str] (default=redshift))
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
ref_band (str] (default=mag_i_lsst))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
dz ([float] default=0.01) – delta z in grid
unobserved_val ([float] default=-99.0) – value to be replaced with zero flux and given large errors for non-observed filters
data_path ([str] default=None) – data_path (str): file path to the SED, FILTER, and AB directories. If left to default None it will use the install directory for rail + ../examples_data/estimation_data/data
filter_list (list] (default=['DC2LSST_u', 'DC2LSST_g', 'DC2LSST_r', 'DC2LSST_i', 'DC2LSST_z', 'DC2LSST_y']))
spectra_file ([str] default=CWWSB4.list) – name of the file specifying the list of SEDs to use
madau_flag ([str] default=no) – set to ‘yes’ or ‘no’ to set whether to include intergalactic Madau reddening when constructing model fluxes
no_prior ([bool] default=False) – set to True if you want to run with no prior
p_min ([float] default=0.005) – BPZ sets all values of the PDF that are below p_min*peak_value to 0.0, p_min controls that fractional cutoff
gauss_kernel ([float] default=0.0) – gauss_kernel (float): BPZ convolves the PDF with a kernel if this is set to a non-zero number
zp_errors (list] (default=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1]))
mag_err_min ([float] default=0.005) – a minimum floor for the magnitude errors to prevent a large chi^2 for very very bright objects
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Constructor, build the CatEstimator, then do BPZ specific setup

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'bpz_lite_estimator'

name = 'BPZliteEstimator'

open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:

tag – Input tag associated to the model
**kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.

Returns:: The object encapsulating the trained model.
Return type:: Any

stage_columns: list[str] | None

class rail.stages.BPZliteInformer

Bases: CatInformer

Inform stage for BPZliteEstimator, this stage assumes that you have a set of SED templates and that the training data has already been assigned a ‘best fit broad type’ (that is, something like ellliptical, spiral, irregular, or starburst, similar to how the six SEDs in the CWW/SB set of Benitez (2000) are assigned 3 broad types). This informer will then fit parameters for the evolving type fraction as a function of apparent magnitude in a reference band, P(T|m), as well as the redshift prior of finding a galaxy of the broad type at a particular redshift, p(z|m, T) where z is redshift, m is apparent magnitude in the reference band, and T is the ‘broad type’. We will use the same forms for these functions as parameterized in Benitez (2000). For p(T|m) we have p(T|m) = exp(-kt(m-m0)) where m0 is a constant and we fit for values of kt For p(z|T,m) we have

` P(z|T,m) = f_x*z0_x^a *exp(-(z/zm_x)^a) where zm_x = z0_x*(km_x-m0) `

where f_x is the type fraction from p(T|m), and we fit for values of z0, km, and a for each type. These parameters are then fed to the BPZ prior for use in the estimation stage.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
ref_band (str] (default=mag_i_lsst))
redshift_col (str] (default=redshift))
data_path ([str] default=None) – data_path (str): file path to the SED, FILTER, and AB directories. If left to default None it will use the install directory for rail + rail/examples_data/estimation_data/data
spectra_file ([str] default=CWWSB4.list) – name of the file specifying the list of SEDs to use
m0 ([float] default=20.0) – reference apparent mag, used in prior param
nt_array ([list] default=[1, 2, 5]) – list of integer number of templates per ‘broad type’, must be in same order as the template set, and must sum to the same number as the # of templates in the spectra file
mmin ([float] default=18.0) – lowest apparent mag in ref band, lower values ignored
mmax ([float] default=29.0) – highest apparent mag in ref band, higher values ignored
init_kt ([float] default=0.3) – initial guess for kt in training
init_zo ([float] default=0.4) – initial guess for z0 in training
init_alpha ([float] default=1.8) – initial guess for alpha in training
init_km ([float] default=0.1) – initial guess for km in training
type_file ([str] default=) – name of file with the broad type fits for the training data
output_hdfn ([bool] default=True) – if True, just return the default HDFN prior params rather than fitting
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Init function, init config stuff

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'bpz_lite_informer'

name = 'BPZliteInformer'

run(): compute the best fit prior parameters

stage_columns: list[str] | None

class rail.stages.CMNNEstimator

Bases: CatEstimator

Color Matched Nearest Neighbor Estimator Note that there are several modifications from the original CMNN, mainly that the original estimator dropped non-detections from the Mahalnobis distance calculation. However, there is information in a non-detection, so instead here I’ve replaced the non-detections with 1 sigma limit and a magnitude uncertainty of 1.0 and fixed the degrees of freedom to be the number of magnitude bands minus one.

Current implementation returns a single Gaussian for each galaxy with a width determined by the std deviation of all galaxies within the range set by the ppf value.

There are three options for how to choose the central value of the Gaussian and that option is set using the selection_mode config parameter (integer): option 0: randomly choose one of the neighbors within the PPF cutoff option 1: choose the value with the smallest Mahalnobis distance option 2: random choice as in option 0, but weighted by distance

If a test galaxy does not have enough training galaxies it is assigned a redshift bad_redshift_val and a width bad_redshift_err, both of which are config parameters that can be set by the user. Note that this should only happen if the number of training galaxies is smaller than min_n, which is unlikely, but is included here for completeness.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col (str] (default=redshift))
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
seed ([int] default=66) – random seed used in selection mode
ppf_value ([float] default=0.68) – PPF value used in Mahalanobis distance
selection_mode ([int] default=1) – select which mode to choose the redshift estimate:0: randomly choose, 1: nearest neigh, 2: weighted random
min_n ([int] default=25) – minimum number of training galaxies to use
min_thresh ([float] default=0.0001) – minimum threshold cutoff
min_dist ([float] default=0.0001) – minimum Mahalanobis distance
bad_redshift_val ([float] default=99.0) – redshift to assign bad redshifts
bad_redshift_err ([float] default=10.0) – Gauss error width to assign to bad redshifts
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do Estimator specific initialization

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'cmnn_estimator'

name = 'CMNNEstimator'

open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:

tag – Input tag associated to the model
**kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.

Returns:: The object encapsulating the trained model.
Return type:: Any

stage_columns: list[str] | None

class rail.stages.CMNNInformer

Bases: CatInformer

compute colors and color errors for CMNN training set and store in a model file that will be used by the CMNNEstimator stage

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
redshift_col (str] (default=redshift))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
nondetect_val (float] (default=99.0))
nondetect_replace ([bool] default=False) – set to True to replace non-detects, False to ignore in distance calculation
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Constructor Do CatInformer specific initialization, then check on bands

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'cmnn_informer'

name = 'CMNNInformer'

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None

class rail.stages.CatClassifier

Bases: RailStage

The base class for assigning classes to catalogue-like table.

Classifier uses a generic “model”, the details of which depends on the sub-class.

CatClassifier take as “input” a catalogue-like table, assign each object into a tomographic bin, and provide as “output” a tabular data which can be appended to the catalogue.

__init__(args, **kwargs)

Initialize Classifier

Parameters:

args (Any)
kwargs (Any)

Return type:

None

classify(input_data, **kwargs)

The main run method for the classifier, should be implemented in the specific subclass.

This will attach the input_data to this CatClassifier (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).

Finally, this will return a TableHandle providing access to that output data.

Parameters:: input_data (TableLike) – A dictionary of all input data
Returns:: Class assignment for each galaxy.
Return type:: TableHandle

entrypoint_function: str | None = 'classify'

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]

model: ModelLike | None

name = 'CatClassifier'

outputs = [('output', <class 'rail.core.data.TableHandle'>)]

stage_columns: list[str] | None

class rail.stages.CatEstimator

Bases: RailStage, PointEstimationMixin

The base class for making photo-z posterior estimates from catalog-like inputs (i.e., tables with fluxes in photometric bands among the set of columns)

Estimators use a generic “model”, the details of which depends on the sub-class.

Estimators take as “input” tabular data, apply the photo-z estimation and provide as “output” a QPEnsemble, with per-object p(z).

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

Parameters:

args (Any)
kwargs (Any)

Return type:

None

classmethod default_distribution_type()

Return the type of distribution that this estimator creates

By default this is DistributionType.ad_hoc But this can be overridden by sub-classes to return DistributionType.posterior or DistributionType.likelihood if appropriate

Return type:: DistributionType

entrypoint_function: str | None = 'estimate'

estimate(input_data, **kwargs)

The main interface method for the photo-z estimation

This will attach the input data (defined in inputs as “input”) to this Estimator (for introspection and provenance tracking). Then call the run(), validate(), and finalize() methods.

The run method will call _process_chunk(), which needs to be implemented in the subclass, to process input data in batches. See RandomGaussEstimator for a simple example.

Finally, this will return a QPHandle for access to that output data.

Parameters:: input_data (TableLike) – A dictionary of all input data
Returns:: Handle providing access to QP ensemble with output data
Return type:: QPHandle

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]

name = 'CatEstimator'

outputs = [('output', <class 'rail.core.data.QPHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.CatSummarizer

Bases: RailStage

The base class for classes that go from catalog-like tables to ensemble NZ estimates.

CatSummarizer take as “input” a catalog-like table. I.e., a table with fluxes in photometric bands among the set of columns.

provide as “output” a QPEnsemble, with per-ensemble n(z).

entrypoint_function: str | None = 'summarize'

inputs = [('input', <class 'rail.core.data.TableHandle'>)]

name = 'CatSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>)]

stage_columns: list[str] | None

summarize(input_data)

The main run method for the summarization, should be implemented in the specific subclass.

This will attach the input_data to this CatSummarizer (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this CatSummarizer by using self.add_data(‘output’, output_data).

Finally, this will return a QPHandle providing access to that output data.

Parameters:: input_data (TableLike) – Either a dictionary of all input data or a TableHandle providing access to the same
Returns:: Ensemble with n(z), and any ancillary data
Return type:: QPHandle

class rail.stages.ColumnMapper

Bases: RailStage

Utility stage that remaps the names of columns.

This operates on pandas dataframs in parquet files.

2. In short, this does: output_data = input_data.rename(columns=self.config.columns, in_place=self.config.in_place)

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
columns ([dict] (required)) – Map of columns to rename
in_place ([bool] default=False) – Update file in place
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

inputs = [('input', <class 'rail.core.data.PqHandle'>)]

interactive_function: str | None = 'column_mapper'

name = 'ColumnMapper'

outputs = [('output', <class 'rail.core.data.PqHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

class rail.stages.Creator

Bases: RailStage

Base class for Creators that generate synthetic photometric data from a model.

Creator will output a table of photometric data. The details will depend on the particular engine.

__init__(args, **kwargs)

Initialize Creator

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'sample'

inputs = [('model', <class 'rail.core.data.ModelHandle'>)]

name = 'Creator'

outputs = [('output', <class 'rail.core.data.TableHandle'>)]

sample(n_samples=None, seed=None, **kwargs)

Draw samples from the model specified in the configuration.

This is a method for running a Creator in interactive mode. In pipeline mode, the subclass run method will be called by itself.

Parameters:

n_samples (int, optional) – The number of samples to draw, by default None
seed (int, optional) – The random seed to control sampling, by default None
**kwargs (Any) – Used to update the configuration

Returns:

TableHandle wrapping the newly created samples

Return type:

TableHandle

Notes

This method puts n_samples and seed into the stage configuration data, which makes them available to other methods.

It then calls the run method, which must be defined by a subclass.

Finally, the TableHandle associated to the output tag is returned.

stage_columns: list[str] | None

class rail.stages.DNFEstimator

Bases: CatEstimator

A class for estimating photometric redshifts using the DNF method.

This class extends CatEstimator and predicts redshifts based on photometric. It supports multiple selection modes for redshift estimation, processes missing data, and generates probability density functions (PDFs) for photometric redshifts.

Metrics (selection_mode): - ENF (1): Euclidean neighbourhood. It’s a common distance metric used in kNN (k-Nearest Neighbors) for photometric redshift prediction. - ANF (2): uses normalized inner product for more accurate photo-z predictions. It is particularly recommended when working with datasets containing more than four filters. - DNF (3): combines Euclidean and angular metrics, improving accuracy, especially for larger neighborhoods, and maintaining proportionality in observable content.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col (str] (default=redshift))
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
selection_mode ([int] default=1) – select which mode to choose the redshift estimate:0: ENF, 1: ANF, 2: DNF
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do Estimator specific initialization

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'dnf_estimator'

name = 'DNFEstimator'

open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:

tag – Input tag associated to the model
**kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.

Returns:: The object encapsulating the trained model.
Return type:: Any

stage_columns: list[str] | None

class rail.stages.DNFInformer

Bases: CatInformer

A class for photometric redshift estimation.

This class extends CatInformer and processes photometric data to train for estimating redshifts. It handles missing data by replacing non-detections with predefined magnitude limits and assigns errors accordingly.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname (str] (default=photometry))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
redshift_col (str] (default=redshift))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
nondetect_val (float] (default=99.0))
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Constructor Do CatInformer specific initialization, then check on bands

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'dnf_informer'

name = 'DNFInformer'

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None

class rail.stages.DSPSPhotometryCreator

Bases: Creator

Derived class of Creator that generate synthetic absolute and apparent magnitudes from one or more SED models generated with the DSPSSingleSedModeler or DSPSPopulationSedModeler classes. It accepts as input Hdf5Handles containing the rest-frame SEDs in units of Lsun/Hz and outputs an Hdf5Handle containing sequential indices, absolute and apparent magnitudes for each galaxy. Photometric quantities are computed for the filters defined in the configuration file.

jax serially execute the computations on CPU on single core, for CPU parallelization you need MPI. If GPU is used, jax natively and automatically parallelize the execution.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
redshift_key ([str] default=redshifts) – Redshift keyword name of the hdf5 dataset containing rest-frame SEDs
restframe_sed_key ([str] default=restframe_seds) – Rest-frame SED keyword name of the hdf5 dataset containing rest-frame SEDs
absolute_mags_key ([str] default=rest_frame_absolute_mags) – Absolute magnitudes keyword name of the output hdf5 dataset
apparent_mags_key ([str] default=apparent_mags) – Apparent magnitudes keyword name of the output hdf5 dataset
filter_folder ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/filters) – Folder containing filter transmissions
instrument_name ([str] default=lsst) – Instrument name as prefix to filter transmission files
wavebands ([str] default=u,g,r,i,z,y) – Comma-separated list of wavebands
min_wavelength ([float] default=250) – Minimum input rest-frame wavelength SEDs
max_wavelength ([float] default=12000) – Maximum input rest-frame wavelength SEDs
ssp_templates_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/ssp_data_fsps_v3.2_lgmet_age.h5) – hdf5 file storing the SSP libraries used to create SEDs
default_cosmology ([bool] default=True) – True to use default DSPS cosmology. If False,Om0, w0, wa, h need to be supplied in the sample function
model (Hdf5Handle (INPUT))
output (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

Initialize DSPSPhotometryCreator class. If the SSP templates are not provided by the user, they are automatically downloaded from the public NERSC directory. These default templates are created with default FSPS values, with gas emission at fixed gas solar metallicity value. The _b and _c tuples for jax are composed of None or 0, depending on whether you don’t or do want the array axis to map over for all arguments.

Parameters:

args
comm

default_files_folder = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data'

entrypoint_function: str | None = 'sample'

inputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]

interactive_function: str | None = 'dsps_photometry_creator'

name = 'DSPSPhotometryCreator'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run(): This function computes rest-frame absolute magnitudes in the provided wavebands for all the galaxies in the population by calling _calc_rest_mag_vmap from DSPS. It does the same for the observed magnitudes in the AB system by calling _calc_obs_mag_vmap from DSPS. It then stores both kind of magnitudes and the galaxy indices into an Hdf5Handle.

sample(model, seed=None, Om0=0.3075, w0=-1.0, wa=0.0, h=0.6774, **kwargs)

Creates observed and absolute magnitudes for the population of galaxy rest-frame SEDs and stores them into an Hdf5Handle.

Parameters:

model (str) – Filepath to the hdf5 table containing the galaxy rest-frame SEDs.
seed (int) – The random seed to control sampling
Om0 (float) – Omega matter: density of non-relativistic matter in units of the critical density at z=0.
w0 (float) – Dark energy equation of state at z=0 (a=1). This is pressure/density for dark energy in units where c=1.
wa (float) – Negative derivative of the dark energy equation of state with respect to the scale factor. A cosmological constant has w0=-1.0 and wa=0.0.
h (float) – dimensionless Hubble constant at z=0.

Returns:

Hdf5Handle storing the absolute and apparent magnitudes.

Return type:

Hdf5Handle

Notes

This method puts seed into the stage configuration data, which makes them available to other methods. It then calls the run method. Finally, the Hdf5Handle associated to the output tag is returned.

stage_columns: list[str] | None

class rail.stages.DSPSPopulationSedModeler

Bases: Modeler

Derived class of Modeler for creating a population of galaxy rest-frame SED models using DSPS v3. (Hearin+21). SPS calculations are based on a set of template SEDs of simple stellar populations (SSPs). Supplying such templates is outside the planned scope of the DSPS package, and so they will need to be retrieved from some other library. For example, the FSPS library supplies such templates in a convenient form.

The input galaxy properties, such as star-formation histories and metallicities, need to be supplied via an hdf5 table.

The user-provided metallicity grid should be consistently defined with the metallicity of the templates SEDs. Users should be cautious in the use of the cosmic time grid. The time resolution strongly depends on the user scientific aim. jax serially execute the computations on CPU on single core, for CPU parallelization you need MPI. If GPU is used, jax natively and automatically parallelize the execution.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
ssp_templates_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/ssp_data_fsps_v3.2_lgmet_age.h5) – hdf5 file storing the SSP libraries used to create SEDs
redshift_key ([str] default=redshift) – Redshift keyword name of the hdf5 dataset
cosmic_time_grid_key ([str] default=cosmic_time_grid) – Cosmic time grid keyword name of the hdf5 dataset, this is the grid of Universe age over which the stellar mass build-up takes place in units of Gyr
star_formation_history_key ([str] default=star_formation_history) – Star-formation history keyword name of the hdf5 dataset, this is the star-formation history of the galaxy in units of Msun/yr
stellar_metallicity_key ([str] default=stellar_metallicity) – Stellar metallicity keyword name of the hdf5 dataset, this is the stellar metallicity in units of log10(Z)
stellar_metallicity_scatter_key ([str] default=stellar_metallicity_scatter) – Stellar metallicity scatter keyword name of the hdf5 dataset, this is lognormal scatter in the metallicity distribution function
restframe_sed_key ([str] default=restframe_seds) – Rest-frame SED keyword name of the output hdf5 dataset
default_cosmology ([bool] default=True) – True to use default DSPS cosmology. If False,Om0, w0, wa, h need to be supplied in the fit_model function
min_wavelength ([float] default=250) – Minimum output rest-frame wavelength
max_wavelength ([float] default=12000) – Maximum output rest-frame wavelength
input (Hdf5Handle (INPUT))
model (Hdf5Handle (OUTPUT))

__init__(args, **kwargs): Initialize SedModeler class. If the SSP templates are not provided by the user, they are automatically downloaded from the public NERSC directory. These default templates are created with default FSPS values, with gas emission at fixed gas solar metallicity value. The _a tuple for jax is composed of None or 0, depending on whether you don’t or do want the array axis to map over for all arguments.

default_files_folder = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data'

entrypoint_function: str | None = 'fit_model'

fit_model(input_data='/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/input_galaxy_properties_dsps.hdf5', Om0=0.3075, w0=-1.0, wa=0.0, h=0.6774, **kwargs)

This function generates the rest-frame SEDs and stores them into the Hdf5Handle.

Parameters:

input_data (str) – Filepath to the hdf5 table containing galaxy properties.
Om0 (float) – Omega matter: density of non-relativistic matter in units of the critical density at z=0.
w0 (float) – Dark energy equation of state at z=0 (a=1). This is pressure/density for dark energy in units where c=1.
wa (float) – Negative derivative of the dark energy equation of state with respect to the scale factor. A cosmological constant has w0=-1.0 and wa=0.0.
h (float) – dimensionless Hubble constant at z=0.

Returns:

Hdf5 table storing the rest-frame SED model

Return type:

Hdf5Handle

inputs = [('input', <class 'rail.core.data.Hdf5Handle'>)]

interactive_function: str | None = 'dsps_population_sed_modeler'

name = 'DSPSPopulationSedModeler'

outputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]

run()

Run method. It Calls _get_rest_frame_seds from DSPS to create rest-frame SEDs for a population of galaxies. The load_ssp_templates function loads the SSP templates created with FSPS. The resulting NamedTuple has 4 entries:

ssp_lgmetndarray of shape (n_met, )
    Array of log10(Z) of the SSP templates where dimensionless Z is the mass fraction of elements heavier than He
ssp_lg_age_gyrndarray of shape (n_ages, )
    Array of log10(age/Gyr) of the SSP templates
ssp_wave : ndarray of shape (n_wave, )
ssp_fluxndarray of shape (n_met, n_ages, n_wave)
    SED of the SSP in units of Lsun/Hz/Msun

Notes

The initial stellar mass of the galaxy is 0. The definition of the stellar mass table as cumulative sum refers to the total stellar mass formed. DSPS conveniently provides IMF-dependent fitting functions to compute the surviving mass (see surviving_mstar.py). The units of the resulting rest-frame SED is solar luminosity per Hertz. The luminosity refers to that emitted by the formed mass at the time of observation.

stage_columns: list[str] | None

class rail.stages.DSPSSingleSedModeler

Bases: Modeler

Derived class of Modeler for creating a single galaxy rest-frame SED model using DSPS v3. (Hearin+21). SPS calculations are based on a set of template SEDs of simple stellar populations (SSPs). Supplying such templates is outside the planned scope of the DSPS package, and so they will need to be retrieved from some other library. For example, the FSPS library supplies such templates in a convenient form.

The input galaxy properties, such as star-formation histories and metallicities, need to be supplied via an hdf5 table.

The user-provided metallicity grid should be consistently defined with the metallicity of the templates SEDs. Users should be cautious in the use of the cosmic time grid. The time resolution strongly depends on the user scientific aim.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
ssp_templates_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/ssp_data_fsps_v3.2_lgmet_age.h5) – hdf5 file storing the SSP libraries used to create SEDs
redshift_key ([str] default=redshifts) – Redshift keyword name of the hdf5 dataset
cosmic_time_grid_key ([str] default=cosmic_time_grid) – Cosmic time grid keyword name of the hdf5 dataset, this is the grid of Universe age over which the stellar mass build-up takes place in units of Gyr
star_formation_history_key ([str] default=star_formation_history) – Star-formation history keyword name of the hdf5 dataset, this is the star-formation history of the galaxy in units of Msun/yr
stellar_metallicity_key ([str] default=stellar_metallicity) – Stellar metallicity keyword name of the hdf5 dataset, this is the stellar metallicity in units of log10(Z)
stellar_metallicity_scatter_key ([str] default=stellar_metallicity_scatter) – Stellar metallicity scatter keyword name of the hdf5 dataset, this is lognormal scatter in the metallicity distribution function
restframe_sed_key ([str] default=restframe_sed) – Rest-frame SED keyword name of the output hdf5 dataset
default_cosmology ([bool] default=True) – True to use default DSPS cosmology. If False,Om0, w0, wa, h need to be supplied in the fit_model function
min_wavelength ([float] default=250) – Minimum output rest-frame wavelength
max_wavelength ([float] default=12000) – Maximum output rest-frame wavelength
input (Hdf5Handle (INPUT))
model (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

Initialize SedModeler class. If the SSP templates are not provided by the user, they are automatically downloaded from the public NERSC directory. These default templates are created with default FSPS values, with gas emission at fixed gas solar metallicity value.

Parameters:

args
comm

default_files_folder = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data'

entrypoint_function: str | None = 'fit_model'

fit_model(input_data='/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/input_galaxy_properties_dsps.hdf5', Om0=0.3075, w0=-1.0, wa=0.0, h=0.6774, **kwargs)

This function generates the rest-frame SEDs and stores them into the Hdf5Handle.

Parameters:

input_data (str) – Filepath to the hdf5 table containing galaxy properties.
Om0 (float) – Omega matter: density of non-relativistic matter in units of the critical density at z=0.
w0 (float) – Dark energy equation of state at z=0 (a=1). This is pressure/density for dark energy in units where c=1.
wa (float) – Negative derivative of the dark energy equation of state with respect to the scale factor. A cosmological constant has w0=-1.0 and wa=0.0.
h (float) – dimensionless Hubble constant at z=0.

Returns:

Hdf5 table storing the rest-frame SED model

Return type:

Hdf5Handle

inputs = [('input', <class 'rail.core.data.Hdf5Handle'>)]

interactive_function: str | None = 'dsps_single_sed_modeler'

name = 'DSPSSingleSedModeler'

outputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]

run()

Run method. It Calls _get_rest_frame_seds from DSPS to create a galaxy rest-frame SED. The load_ssp_templates function loads the SSP templates created with FSPS. The resulting NamedTuple has 4 entries:

ssp_lgmetndarray of shape (n_met, )
    Array of log10(Z) of the SSP templates where dimensionless Z is the mass fraction of elements heavier than He
ssp_lg_age_gyrndarray of shape (n_ages, )
    Array of log10(age/Gyr) of the SSP templates
ssp_wave : ndarray of shape (n_wave, )
ssp_fluxndarray of shape (n_met, n_ages, n_wave)
    SED of the SSP in units of Lsun/Hz/Msun

Notes

The initial stellar mass of the galaxy is 0. The definition of the stellar mass table as cumulative sum refers to the total stellar mass formed. DSPS conveniently provides IMF-dependent fitting functions to compute the surviving mass (see surviving_mstar.py). The units of the resulting rest-frame SED is solar luminosity per Hertz. The luminosity refers to that emitted by the formed mass at the time of observation.

stage_columns: list[str] | None

class rail.stages.Degrader

Bases: RailStage

Base class Degraders, which apply various degradations to synthetic photometric data.

Degraders take “input” data in the form of pandas dataframes in Parquet files and provide as “output” another pandas dataframes written to Parquet files.

entrypoint_function: str | None = '__call__'

inputs = [('input', <class 'rail.core.data.PqHandle'>)]

name = 'Degrader'

outputs = [('output', <class 'rail.core.data.PqHandle'>)]

stage_columns: list[str] | None

class rail.stages.Dereddener

Bases: DustMapBase

Utility stage that does dereddening

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
ra_name ([str] default=ra) – Name of the RA column
dec_name ([str] default=dec) – Name of the DEC column
mag_name ([str] default=mag_{band}_lsst) – Template for the magnitude columns
band_a_env (dict] (default={'mag_u_lsst': 4.81, 'mag_g_lsst': 3.64, 'mag_r_lsst': 2.7, 'mag_i_lsst': 2.06, 'mag_z_lsst': 1.58, 'mag_y_lsst': 1.31}))
dustmap_name ([str] default=sfd) – Name of the dustmap in question
dustmap_dir ([str] (required)) – Directory with dustmaps
copy_cols ([list] default=[]) – Additional columns to copy
copy_all_cols ([bool] default=False) – Copy all the columns
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'dereddener'

name = 'Dereddener'

class rail.stages.DistToDistEvaluator

Bases: Evaluator

Evaluate the performance of a photo-z estimator against reference PDFs

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
metrics ([list] default=[]) – The metrics you want to evaluate.
exclude_metrics ([list] default=[]) – List of metrics to exclude
metric_config ([dict] default={}) – configuration of individual_metrics
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
seed ([float] default=None) – Random seed value to use for reproducible results.
force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization
metric_integration_limits ([list] default=[0.0, 3.0]) – The default end points for calculating metrics on a grid.
dx ([float] default=0.01) – The default step size when calculating metrics on a grid.
n_samples ([int] default=100) – The number of random samples to select for certain metrics.
input (QPHandle (INPUT))
truth (QPHandle (INPUT))
output (Hdf5Handle (OUTPUT))
summary (Hdf5Handle (OUTPUT))
single_distribution_summary (QPDictHandle (OUTPUT))

entrypoint_function: str | None = 'evaluate'

inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.QPHandle'>)]

interactive_function: str | None = 'dist_to_dist_evaluator'

metric_base_class: alias of DistToDistMetric

name = 'DistToDistEvaluator'

stage_columns: list[str] | None

class rail.stages.DistToPointEvaluator

Bases: Evaluator

Evaluate the performance of a photo-z estimator against reference point estimate

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
metrics ([list] default=[]) – The metrics you want to evaluate.
exclude_metrics ([list] default=[]) – List of metrics to exclude
metric_config ([dict] default={}) – configuration of individual_metrics
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
seed ([float] default=None) – Random seed value to use for reproducible results.
force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization
metric_integration_limits ([list] default=[0.0, 3.0]) – The default end points for calculating metrics on a grid.
dx ([float] default=0.01) – The default step size when calculating metrics on a grid.
quantile_grid ([list] (default=[...])) – The quantile value grid on which to evaluate the CDF values. (0, 1)
x_grid ([list] (default=[...])) – The x-value grid at which to evaluate the pdf values.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
reference_dictionary_key ([str] default=redshift) – The key in the truth dictionary where the redshift data is stored.
input (QPHandle (INPUT))
truth (TableHandle (INPUT))
output (Hdf5Handle (OUTPUT))
summary (Hdf5Handle (OUTPUT))
single_distribution_summary (QPDictHandle (OUTPUT))

entrypoint_function: str | None = 'evaluate'

inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'dist_to_point_evaluator'

metric_base_class: alias of DistToPointMetric

name = 'DistToPointEvaluator'

stage_columns: list[str] | None

class rail.stages.DustMapBase

Bases: RailStage

Utility stage that does dereddening

Note: set copy_all_cols=True to copy all columns in data, copy_cols will be ignored

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
ra_name ([str] default=ra) – Name of the RA column
dec_name ([str] default=dec) – Name of the DEC column
mag_name ([str] default=mag_{band}_lsst) – Template for the magnitude columns
band_a_env (dict] (default={'mag_u_lsst': 4.81, 'mag_g_lsst': 3.64, 'mag_r_lsst': 2.7, 'mag_i_lsst': 2.06, 'mag_z_lsst': 1.58, 'mag_y_lsst': 1.31}))
dustmap_name ([str] default=sfd) – Name of the dustmap in question
dustmap_dir ([str] (required)) – Directory with dustmaps
copy_cols ([list] default=[]) – Additional columns to copy
copy_all_cols ([bool] default=False) – Copy all the columns
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

fetch_map()

inputs = [('input', <class 'rail.core.data.PqHandle'>)]

interactive_function: str | None = 'dust_map_base'

name = 'DustMapBase'

outputs = [('output', <class 'rail.core.data.PqHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

class rail.stages.EqualCountClassifier

Bases: PZClassifier

Classifier that simply assign tomographic bins based on point estimate according to SRD

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
object_id_col ([str] default=) – name of object id column
point_estimate_key ([str] default=zmode) – Which point estimate to use
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
n_tom_bins ([int] default=5) – Number of tomographic bins
no_assign ([int] default=-99) – Value for no assignment flag
input (QPHandle (INPUT))
output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'classify'

interactive_function: str | None = 'equal_count_classifier'

name = 'EqualCountClassifier'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run()

Processes the input data in chunks and performs classification.

This method iterates over chunks of the input data, calling the _process_chunk method for each chunk to perform the actual classification.

The _process_chunk method should be implemented by subclasses to define the specific classification logic.

Return type:: None

stage_columns: list[str] | None

class rail.stages.EuclidDeepErrorModel

Bases: PhotoErrorModel

The Euclid Deep Error model, defined by peEuclidDeepErrorParams and peEuclidDeepErrorModel

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'euclid_deep_error_model'

name = 'EuclidDeepErrorModel'

stage_columns: list[str] | None

class rail.stages.EuclidErrorModel

Bases: PhotoErrorModel

The Euclid Error model, defined by peEuclidErrorParams and peEuclidErrorModel

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'euclid_error_model'

name = 'EuclidErrorModel'

stage_columns: list[str] | None

class rail.stages.EuclidWideErrorModel

Bases: PhotoErrorModel

The Euclid Wide Error model, defined by peEuclidWideErrorParams and peEuclidWideErrorModel

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'euclid_wide_error_model'

name = 'EuclidWideErrorModel'

stage_columns: list[str] | None

class rail.stages.Evaluator

Bases: RailStage

Evaluate the performance of a photo-z estimator against reference point estimate

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
metrics ([list] default=[]) – The metrics you want to evaluate.
exclude_metrics ([list] default=[]) – List of metrics to exclude
metric_config ([dict] default={}) – configuration of individual_metrics
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
seed ([float] default=None) – Random seed value to use for reproducible results.
force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization
output (Hdf5Handle (OUTPUT))
summary (Hdf5Handle (OUTPUT))
single_distribution_summary (QPDictHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'evaluate'

evaluate(data, truth, **kwargs)

Evaluate the performance of an estimator

This will attach the input data and truth to this Evaluator (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Estimator by using self.add_data(‘output’, output_data).

Parameters:

data (qp.Ensemble) – The sample to evaluate
truth (Any) – Table with the truth information

Returns:

The evaluation metrics

Return type:

dict[str, DataHandle]

finalize()

Finalize the stage, moving all its outputs to their final locations.

Return type:: None

inputs: list[tuple[str, type[DataHandle]]] = []

metric_base_class: type[BaseMetric] | None = None

name = 'Evaluator'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>), ('summary', <class 'rail.core.data.Hdf5Handle'>), ('single_distribution_summary', <class 'rail.core.data.QPDictHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

run_single_node()

Return type:: None

stage_columns: list[str] | None

class rail.stages.FSPSPhotometryCreator

Bases: Creator

Derived class of Creator that generate synthetic photometric fsps_default_data from the rest-frame SED model generated with the FSPSSedModeler class. The user is required to provide galaxy redshifts and filter information in an .npy format for the code to run. The restframe SEDs are stored in a pickle file or passed as ModelHandle. Details of what each file should contain are explicited in config_options. The output is a Fits table containing magnitudes.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
redshift_key ([str] default=redshifts) – Redshift keyword name of the hdf5 dataset containing rest-frame SEDs
restframe_sed_key ([str] default=restframe_seds) – Rest-frame SED keyword name of the hdf5 dataset containing rest-frame SEDs
restframe_wave_key ([str] default=wavelength) – Rest-frame wavelengths keyword name of thehdf5 dataset containing rest-frame SEDs
apparent_mags_key ([str] default=apparent_mags) – Apparent magnitudes keyword name of the output hdf5 dataset
filter_folder ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/fsps_default_data/filters) – Folder containing filter transmissions
instrument_name ([str] default=lsst) – Instrument name as prefix to filter transmission files
wavebands ([str] default=u,g,r,i,z,y) – Comma-separated list of wavebands
filter_wave_key ([str] default=wave)

filter_transm_key: [str] default=transmission

Om0: [float] default=0.3: Omega matter at current time
Ode0: [float] default=0.7: Omega dark energy at current time
w0: [float] default=-1: Dark energy equation-of-state parameter at current time
wa: [float] default=0.0: Slope dark energy equation-of-state evolution with scale factor
h: [float] default=0.7: Dimensionless hubble constant
use_planck_cosmology: [bool] default=False: True to overwrite the cosmological parameters to their Planck2015 values
physical_units: [bool] default=False: A parameter

msg: str] (default=False (True) for rest-frame spectra in units ofLsun/Hz (erg/s/Hz))

model: Hdf5Handle (INPUT)

output: Hdf5Handle (OUTPUT)

__init__(args, **kwargs): Initialize class. The _b and _c tuples for jax are composed of None or 0, depending on whether you don’t or do want the array axis to map over for all arguments. :param args: :param comm:

default_files_folder = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/fsps_default_data'

entrypoint_function: str | None = 'sample'

inputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]

interactive_function: str | None = 'fsps_photometry_creator'

name = 'FSPSPhotometryCreator'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run(): This function computes apparent AB magnitudes in the provided wavebands for all the galaxies in the population having rest-frame SEDs computed by FSPS. It then stores apparent magnitudes, redshifts and running indices into an Hdf5Handle.

sample(input_data, seed=None, **kwargs)

Creates observed magnitudes for the population of galaxies and stores them into an Hdf5Handle.

Parameters:

input_data (Hdf5Handle) – Hdf5Handle containing the rest-frame SED models.
seed (int | None, optional) – The random seed to control sampling, by default None

Returns:

Hdf5Handle storing the apparent magnitudes and redshifts of galaxies.

Return type:

Hdf5Handle

Notes

This method puts seed into the stage configuration data, which makes them available to other methods. It then calls the run method. Finally, the Hdf5Handle associated to the output tag is returned.

stage_columns: list[str] | None

class rail.stages.FSPSSedModeler

Bases: Modeler

Derived class of Modeler for creating a single galaxy rest-frame SED model using FSPS (Conroy08).

Only the most important parameters are provided via config_options. The remaining ones from FSPS can be provided when creating the rest-frame SED model.

Install FSPS with the following commands:

 pip uninstall fsps
git clone --recursive https://github.com/dfm/python-fsps.git
cd python-fsps
python -m pip install .
export SPS_HOME=$(pwd)/src/fsps/libfsps

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size (int] (default=10000))
hdf5_groupname (str] (required))
compute_vega_mags ([bool] default=False) – True uses Vega magnitudes versus AB magnitudes
vactoair_flag ([bool] default=False) – If True, output wavelengths in air (rather than vac)
zcontinuous ([int] default=1) – Flag for interpolation in metallicity of SSP before CSP
add_agb_dust_model ([bool] default=True) – Turn on/off adding AGB circumstellar dust contribution to SED
add_dust_emission ([bool] default=True) – Turn on/off adding dust emission contribution to SED
add_igm_absorption ([bool] default=False) – Turn on/off adding IGM absorption contribution to SED
add_neb_emission ([bool] default=False) – Turn on/off nebular emission model based on Cloudy
add_neb_continuum ([bool] default=False) – Turn on/off nebular continuum component
add_stellar_remnants ([bool] default=True) – Turn on/off adding stellar remnants contribution to stellar mass
compute_light_ages ([bool] default=False) – If True then the returned spectra are actually light-weighted ages (in Gyr)
nebemlineinspec ([bool] default=False) – True to include emission line fluxes in spectrum
smooth_velocity ([bool] default=True) – True/False for smoothing in velocity/wavelength space
smooth_lsf ([bool] default=False) – True/False for smoothing SSPs by a wavelength dependent line spread function
cloudy_dust ([bool] default=False) – Switch to include dust in the Cloudy tables
agb_dust ([float] default=1.0) – Scales the circumstellar AGB dust emission
tpagb_norm_type ([int] default=2) – Flag for TP-AGB normalization scheme, default Villaume, Conroy, Johnson 2015 normalization
dell ([float] default=0.0) – Shift in log(L_bol) of the TP-AGB isochrones
delt ([float] default=0.0) – Shift in log(T_eff) of the TP-AGB isochrones
redgb ([float] default=1.0) – Modify weight given to RGB. Only available with BaSTI isochrone set
agb ([float] default=1.0) – Modify weight given to TP-AGB
fcstar ([float] default=1.0) – Fraction of stars that the Padova isochrones identify as Carbon stars
sbss ([float] default=0.0) – Specific frequency of blue straggler stars
fbhb ([float] default=0.0) – Fraction of horizontal branch stars that are blue
pagb ([float] default=1.0) – Weight given to the post–AGB phase
redshifts_key ([str] default=redshifts) – galaxy redshift, dataset keyword name
zmet_key ([str] default=zmet) – The metallicity is specified as an integer ranging between 1 and nz. If zcontinuous > 0 then this parameter is ignored, dataset keyword name
stellar_metallicities_key ([str] default=stellar_metallicity) – galaxy stellar metallicities (log10(Z / Zsun)) dataset keyword name, to be used with zcontinuous > 0,dataset keyword name
pmetals_key ([str] default=pmetals) – The power for the metallicty distribution function,only used if zcontinous=2, dataset keyword name
imf_type ([int] default=1) – IMF type, see FSPS manual, default Chabrier IMF
imf_upper_limit ([float] default=120.0) – The upper limit of the IMF in solar masses
imf_lower_limit ([float] default=0.08) – The lower limit of the IMF in solar masses
imf1 ([float] default=1.3) – log slope of IMF in 0.08<M/Msun<0.5, if imf_type=2
imf2 ([float] default=2.3) – log slope of IMF in 0.5<M/Msun<1, if imf_type=2
imf3 ([float] default=2.3) – log slope of IMF in M/Msun>1, if imf_type=2
vdmc ([float] default=0.08) – IMF parameter defined in van Dokkum (2008). Only used if imf_type=3
mdave ([float] default=0.5) – IMF parameter defined in Dave (2008). Only used if imf_type=4.
evtype ([int] default=-1) – Compute SSPs for only the given evolutionary type. All phases used when set to -1.
use_wr_spectra ([int] default=1) – Turn on/off the WR spectral library
logt_wmb_hot ([float] default=0.0) – Use the Eldridge (2017) WMBasic hot star library above this value of log(T_eff) or 25,000K,whichever is larger
masscut ([float] default=150.0) – Truncate the IMF above this value
velocity_dispersions_key ([str] default=stellar_velocity_dispersion) – stellar velocity dispersions (km/s), dataset keyword name
min_wavelength ([float] default=3000) – minimum rest-frame wavelength
max_wavelength ([float] default=10000) – maximum rest-frame wavelength
gas_ionizations_key ([str] default=gas_ionization) – gas ionization values dataset keyword name
gas_metallicities_key ([str] default=gas_metallicity) – gas metallicities (log10(Zgas / Zsun)) dataset keyword name
igm_factor ([float] default=1.0) – Factor used to scale the IGM optical depth
sfh_type ([int] default=0) – star-formation history type, see FSPS manual, default SSP
tau_key ([str] default=tau) – Defines e-folding time for the SFH, in Gyr. Only used if sfh=1 or sfh=4, dataset keyword name
const_key ([str] default=const) – Defines the constant component of the SFH, Only used if sfh=1 or sfh=4, dataset keyword name
sf_start_key ([str] default=sf_start) – Start time of the SFH, in Gyr. Only used if sfh=1 or sfh=4 or sfh=5, dataset keyword name
sf_trunc_key ([str] default=sf_trunc) – Truncation time of the SFH, in Gyr. Only used if sfh=1 or sfh=4 or sfh=5, dataset keyword name
stellar_ages_key ([str] default=stellar_age) – galaxy stellar ages (Gyr),dataset keyword name
fburst_key ([str] default=fburst) – Deﬁnes the fraction of mass formed in an instantaneous burst of star formation. Only used if sfh=1 or sfh=4,dataset keyword name
tburst_key ([str] default=tburst) – Defines the age of the Universe when the burst occurs. If tburst > tage then there is no burst. Only used if sfh=1 or sfh=4, dataset keyword name
sf_slope_key ([str] default=sf_slope) – For sfh=5, this is the slope of the SFR after time sf_trunc, dataset keyword name
dust_type ([int] default=2) – attenuation curve for dust type, see FSPS manual, default Calzetti
dust_tesc ([float] default=7.0) – Stars younger than dust_tesc are attenuated by both dust1 and dust2, while stars older are attenuated by dust2 only. Units are log(yrs)
dust_birth_cloud_key ([str] default=dust1_birth_cloud) – dust parameter describing young stellar light attenuation (dust1 in FSPS), dataset keyword name
dust_diffuse_key ([str] default=dust2_diffuse) – dust parameters describing old stellar light attenuation (dust2 in FSPS) dataset keyword name
dust_clumps ([int] default=-99) – Dust parameter describing the dispersion of a Gaussian PDF density distribution for the old dust. Setting this value to -99.0 sets the distribution to a uniform screen, values other than -99 are no longer supported
frac_nodust ([float] default=0.0) – Fraction of starlight that is not attenuated by the diffuse dust component
frac_obrun ([float] default=0.0) – Fraction of the young stars (age < dust_tesc) that are not attenuated by dust1 and that do not contribute to any nebular emission, representing runaway OB stars or escaping ionizing radiation. These stars are still attenuated by dust2.
dust_index_key ([str] default=dust_index) – Power law index of the attenuation curve. Only used when dust_type=0, dataset keyword name
dust_powerlaw_modifier_key ([str] default=dust_calzetti_modifier) – power-law modifiers to the shape of the Calzetti et al. (2000) attenuation curve (dust1_index),dataset keyword name
mwr_key ([str] default=mwr) – The ratio of total to selective absorption which characterizes the MW extinction curve: RV=AV/E(B-V), used when dust_type=1,dataset keyword name
uvb_key ([str] default=uvb) – Parameter characterizing the strength of the 2175A extinction feature with respect to the standard Cardelli et al. determination for the MW. Only used when dust_type=1,dataset keyword name
wgp1_key ([str] default=wgp1) – Integer specifying the optical depth in the Witt & Gordon (2000) models. Values range from 1 − 18, used only whendust_type=3, dataset keyword name
wgp2 ([int] default=1) – Integer specifying the type of large-scale geometry and extinction curve. Values range from 1-6, used only when dust_type=3
wgp3 ([int] default=1) – Integer specifying the local geometry for the Witt & Gordon (2000) dust models, used only when dust_type=3
dust_emission_gamma_key ([str] default=dust_gamma) – Relative contributions of dust heated at Umin, parameter of Draine and Li (2007) dust emission modeldataset keyword name
dust_emission_umin_key ([str] default=dust_umin) – Minimum radiation field strengths, parameter of Draine and Li (2007) dust emission model, dataset keyword name
dust_emission_qpah_key ([str] default=dust_qpah) – Grain size distributions in mass in PAHs, parameter of Draine and Li (2007) dust emission model,dataset keyword name
fraction_agn_bol_lum_key ([str] default=f_agn) – Fractional contributions of AGN wrt stellar bolometric luminosity, dataset keyword name
agn_torus_opt_depth_key ([str] default=tau_agn) – Optical depths of the AGN dust torii dataset keyword name
tabulated_sfh_key ([str] default=tabulated_sfh) – tabulated SFH dataset keyword name
tabulated_lsf_key ([str] default=tabulated_lsf) – tabulated LSF dataset keyword name
physical_units ([bool] default=False) – A parameter
msg (str] (default=False (True) for rest-frame spectra in units ofLsun/Hz (erg/s/Hz)))
restframe_wave_key ([str] default=restframe_wavelengths) – Rest-frame wavelength keyword name of the output hdf5 dataset
restframe_sed_key ([str] default=restframe_seds) – Rest-frame SED keyword name of the output hdf5 dataset
input (Hdf5Handle (INPUT))
model (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

This function initializes the FSPSSedModeler class and checks that the provided parameters are within the allowed ranges.

Parameters:

args
comm

entrypoint_function: str | None = 'fit_model'

fit_model(input_data, **kwargs)

This function creates rest-frame SED models from an input galaxy population catalog.

Parameters:: input_data (Hdf5Handle) – This is the input catalog in the form of an Hdf5Handle.
Returns:: ModelHandle storing the rest-frame SED models
Return type:: Hdf5Handle

inputs = [('input', <class 'rail.core.data.Hdf5Handle'>)]

interactive_function: str | None = 'fsps_sed_modeler'

name = 'FSPSSedModeler'

outputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]

run(): Run method. It Calls StellarPopulation from FSPS to create a galaxy rest-frame SED. Thanks to Josue de Santiago, this function is able to run in parallel via mpi by splitting the full sample in chunks of user-defined size.

stage_columns: list[str] | None

class rail.stages.FlexZBoostEstimator

Bases: CatEstimator

FlexZBoost-based CatEstimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
ref_band (str] (default=mag_i_lsst))
qp_representation ([str] default=interp) – qp generator to use. [interp|flexzboost]
include_mag_err ([bool] default=False) – Include magnitude error in the training and estimationprocess
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do CatEstimator specific initialization

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'flex_z_boost_estimator'

name = 'FlexZBoostEstimator'

stage_columns: list[str] | None

class rail.stages.FlexZBoostInformer

Bases: CatInformer

Train a FlexZBoost CatInformer

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
ref_band (str] (default=mag_i_lsst))
redshift_col (str] (default=redshift))
retrain_full ([bool] default=True) – if True, re-run the fit with the full training set, including data set aside for bump/sharpen validation. If False, only use the subset defined via trainfrac fraction
trainfrac ([float] default=0.75) – fraction of training data to use for training (rest used for bump thresh and sharpening determination)
seed ([int] default=1138) – Random number seed
bumpmin ([float] default=0.02) – minimum value in grid of thresholds checked to optimize removal of spurious small bumps
bumpmax ([float] default=0.35) – max value in grid checked for removal of small bumps
nbump ([int] default=20) – number of grid points in bumpthresh grid search
sharpmin ([float] default=0.7) – min value in grid checked in optimal sharpening parameter fit
sharpmax ([float] default=2.1) – max value in grid checked in optimal sharpening parameter fit
nsharp ([int] default=15) – number of search points in sharpening fit
max_basis ([int] default=35) – maximum number of basis funcitons to use in density estimate
basis_system ([str] default=cosine) – type of basis sytem to use with flexcode
regression_params ([dict] default={'max_depth': 8, 'objective': 'reg:squarederror'}) – dictionary of options passed to flexcode, includes max_depth (int), and objective, which should be set to reg:squarederror
include_mag_err ([bool] default=False) – Include magnitude error in the training and estimationprocess
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Constructor Do CatInformer specific initialization, then check on bands

divide_array(grid)

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'flex_z_boost_informer'

name = 'FlexZBoostInformer'

run(): Train flexzboost model model

static split_data(fz_data, sz_data, trainfrac, seed): make a random partition of the training data into training and validation, validation data will be used to determine bump thresh and sharpen parameters.

stage_columns: list[str] | None

class rail.stages.FlowCreator

Bases: Creator

Creator wrapper for a PZFlow Flow object.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
n_samples ([int] (required)) – Number of samples to create
seed ([int] default=12345) – Random number seed
model (FlowHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor

Does standard Creator initialization and also gets the Flow object

entrypoint_function: str | None = 'sample'

inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]

interactive_function: str | None = 'flow_creator'

name = 'FlowCreator'

outputs = [('output', <class 'rail.core.data.PqHandle'>)]

run()

Run method

Calls Flow.sample to use the Flow object to generate photometric data

Notes

Puts the data into the data store under this stages ‘output’ tag

stage_columns: list[str] | None

class rail.stages.FlowModeler

Bases: Modeler

Modeler wrapper for a PZFlow Flow object.

This class trains the flow.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([int] default=0) – The random seed for training.
phys_cols ([dict] default={'redshift': [0, 3]}) – Names of non-photometry columns and their corresponding [min, max] values.
phot_cols ([dict] default={'mag_u_lsst': [17, 35], 'mag_g_lsst': [16, 32], 'mag_r_lsst': [15, 30], 'mag_i_lsst': [15, 30], 'mag_z_lsst': [14, 29], 'mag_y_lsst': [14, 28]}) – Names of photometry columns and their corresponding [min, max] values.
calc_colors ([dict] default={'ref_column_name': 'mag_i_lsst'}) – Whether to internally calculate colors (if phot_cols are magnitudes). Assumes that you want to calculate colors from adjacent columns in phot_cols. If you do not want to calculate colors, set False. Else, provide a dictionary {‘ref_column_name’: band}, where band is a string corresponding to the column in phot_cols you want to save as the overall galaxy magnitude.
spline_knots ([int] default=16) – The number of spline knots in the normalizing flow.
num_training_epochs ([int] default=30) – The number of training epochs.
input (TableHandle (INPUT))
model (FlowHandle (OUTPUT))

__init__(args, **kwargs)

Constructor

Does standard Modeler initialization.

entrypoint_function: str | None = 'fit_model'

inputs = [('input', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'flow_modeler'

name = 'FlowModeler'

outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]

run()

Run method

Calls Flow.train to train a normalizing flow using PZFlow.

Notes

Puts the data into the data store under this stages ‘output’ tag

stage_columns: list[str] | None

validate(): Check that the inputs actually have the data needed for execution, This is called before the run method. It is an optional stage, meant for checking that the input to the stage is actual in the form and shape needed before an expensive run is executed.

class rail.stages.FlowPosterior

Bases: PosteriorCalculator

PosteriorCalculator wrapper for a PZFlow Flow object

data : pd.DataFrame
    Pandas dataframe of the data on which the posteriors are conditioned.
    Must have all columns in self.flow.data_columns, *except*
    for the column specified for the posterior (see below).

column : str
    Name of the column for which the posterior is calculated.
    Must be one of the columns in self.flow.data_columns. However,
    whether or not this column is present in `data` is irrelevant.

grid : np.ndarray
    Grid over which the posterior is calculated.

err_samples : int, optional
    Number of samples from the error distribution to average over for
    the posterior calculation. If provided, Gaussian errors are assumed,
    and method will look for error columns in `inputs`. Error columns
    must end in `_err`. E.g. the error column for the variable `u` must
    be `u_err`. Zero error assumed for any missing error columns.

seed: int, optional
    Random seed for drawing samples from the error distribution.

marg_rules : dict, optional
    Dictionary with rules for marginalizing over missing variables.
    The dictionary must contain the key "flag", which gives the flag
    that indicates a missing value. E.g. if missing values are given
    the value 99, the dictionary should contain {"flag": 99}.
    The dictionary must also contain {"name": callable} for any
    variables that will need to be marginalized over, where name is
    the name of the variable, and callable is a callable that takes
    the row of variables and returns a grid over which to marginalize
    the variable. E.g. {"y": lambda row: np.linspace(0, row["x"], 10)}.
    Note: the callable for a given name must *always* return an array
    of the same length, regardless of the input row.
    DEFAULT: the default marg_rules dict is
    {"flag": np.nan,
    "u": np.linspace(25, 31, 10),}

batch_size: int, default=None
    Size of batches in which to calculate posteriors. If None, all
    posteriors are calculated simultaneously. This is faster, but
    requires more memory.

nan_to_zero : bool, default=True
    Whether to convert NaN's to zero probability in the final pdfs.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
column ([str] (required)) – Column to compute posterior for
grid ([list] default=[]) – Grid over which the posterior is calculated
err_samples ([int] default=10) – A parameter
seed ([int] default=12345) – A parameter
marg_rules ([dict] default={'flag': nan, 'mag_u_lsst': <function FlowPosterior.<lambda> at 0x7ab094c5f110>}) – A parameter
batch_size (int] (default=10000))
nan_to_zero (bool] (default=True))
model (FlowHandle (INPUT))
input (PqHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor

Does standard PosteriorCalculator initialization

entrypoint_function: str | None = 'get_posterior'

inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.PqHandle'>)]

interactive_function: str | None = 'flow_posterior'

name = 'FlowPosterior'

outputs = [('output', <class 'rail.core.data.QPHandle'>)]

run()

Run method

Calls Flow.posterior to use the Flow object to get the posterior distribution.

Notes

Get the input data from the data store under this stages ‘input’ tag Puts the data into the data store under this stages ‘output’ tag

stage_columns: list[str] | None

rail.stages.GCRLoader: alias of GCRCreator

class rail.stages.GPzEstimator

Bases: CatEstimator

Estimate stage for GPz_v1

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
ref_band (str] (default=mag_i_lsst))
log_errors ([bool] default=True) – if true, take log of magnitude errors
replace_error_vals (list] (default=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1]))
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do CatEstimator specific initialization

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'gpz_estimator'

name = 'GPzEstimator'

stage_columns: list[str] | None

class rail.stages.GPzInformer

Bases: CatInformer

Inform stage for GPz_v1

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
trainfrac ([float] default=0.75) – fraction of training data used to make tree, rest used to set best sigma
seed ([int] default=87) – random seed
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
redshift_col (str] (default=redshift))
gpz_method ([str] default=VC) – method to be used in GPz, options are ‘GL’, ‘VL’, ‘GD’, ‘VD’, ‘GC’, and ‘VC’
n_basis ([int] default=50) – number of basis functions used
learn_jointly ([bool] default=True) – if True, jointly learns prior linear mean function
hetero_noise ([bool] default=True) – if True, learns heteroscedastic noise process, set False for point est.
csl_method ([str] default=normal) – cost sensitive learning type, ‘balanced’, ‘normalized’, or ‘normal’
csl_binwidth ([float] default=0.1) – width of bin for ‘balanced’ cost sensitive learning
pca_decorrelate ([bool] default=True) – if True, decorrelate data using PCA as preprocessing stage
max_iter ([int] default=200) – max number of iterations
max_attempt ([int] default=100) – max iterations if no progress on validation
log_errors ([bool] default=True) – if true, take log of magnitude errors
replace_error_vals (list] (default=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1]))
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Constructor Do CatInformer specific initialization

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'gpz_informer'

name = 'GPzInformer'

run(): train the GPz model after splitting train data into train/validation

stage_columns: list[str] | None

class rail.stages.GaussianPzEstimator

Bases: PzEstimator

Estimator which converts to Gaussian reps

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
model (ModelHandle (INPUT))
input (QPHandle (INPUT))
output (QPHandle (OUTPUT))

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'gaussian_pz_estimator'

name = 'GaussianPzEstimator'

stage_columns: list[str] | None

class rail.stages.GaussianPzInformer

Bases: PzInformer

Placeholder Informer

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
input (QPHandle (INPUT))
truth (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'gaussian_pz_informer'

model_handle: ModelHandle | None

name = 'GaussianPzInformer'

stage_columns: list[str] | None

class rail.stages.GridSelection

Bases: Selector

Uses the ratio of HSC spectroscpic galaxies to photometric galaxies to portion a sample into training and application samples. Option to implement a color-based redshift cut off in each pixel. Option of further degrading the training sample by limiting it to galaxies less than a redshift cutoff by specifying redshift_cut.

color_redshift_cut: True or false, implements color-based redshift cut. Default is True.
    If True, ratio_file must include second key called 'data' with magnitudes, colors and spec-z from the spectroscopic sample.
percentile_cut: If using color-based redshift cut, percentile in spec-z above which redshifts will be cut from training sample. Default is 99.0
scaling_factor: Enables the user to adjust the ratios by this factor to change the overall number of galaxies kept.  For example, if you wish
    to generate 100,00 galaxies but only 50,000 are selected by default, then you can adjust factor up by a factor of 2 to return more galaixes.
redshift_cut: redshift above which all galaxies will be removed from training sample. Default is 100
ratio_file: hdf5 file containing an array of spectroscpic vs. photometric galaxies in each pixel. Default is hsc_ratios.hdf5 for an HSC based selection
settings_file: pickled dictionary containing information about colors and magnitudes used in defining the pixels. Dictionary must include the following keys:
    'x_band_1': string, this is the band used for the magnitude in the color magnitude diagram. Default for HSC is 'i'.
    'x_band_2': string, this is the redder band used for the color in the color magnitude diagram.
    if x_band_2 string is not set to '' then the grid is assumed to be over color and x axis color is set to x_band_1 - x_band_2, default is ''.
    'y_band_1': string, this is the bluer band used for the color in the color magnitude grid. Default for HSC is 'g'.
    'y_band_2': string, this is the redder band used for the color in the color magnitude diagram.
    if y_band_2 is not set to '' then the y-band is assumed to be over color and is set to y_band_1 - y_band 2.
    'x_limits': 2-element list, this is a list of the lower and upper limits of the magnitude. Default for HSC is [13, 16],
    'y_limits': 2-element list, this is a list of the lower and upper limits of the color. Default for HSC is [-2, 6]}

NOTE: the default ‘HSC’ grid file, located in rail/examples_data/creation_data/data/hsc_ratios_and_specz.hdf5, is based on data from the Second HSC Data Release, details of which can be found here: Aihara, H., AlSayyad, Y., Ando, M., et al. 2019, PASJ, 71, 114 doi: 10.1093/pasj/psz103

Update(Apr 16 2024): Now inherit from selector and implement the _select() instead of run()

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
color_redshift_cut ([bool] default=True) – using color-based redshift cut
percentile_cut ([float] default=99.0) – percentile cut-off for each pixel in color-based redshift cut off
redshift_cut ([float] default=100.0) – cut redshifts above this value
ratio_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/hsc_ratios_and_specz.hdf5) – path to ratio file
settings_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/HSC_grid_settings.pkl) – path to pickled parameters file
random_seed ([int] default=12345) – random seed for reproducibility
scaling_factor ([float] default=1.588) – multiplicative factor for ratios to adjust number of galaxies kept
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

def_ratio_file = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/hsc_ratios_and_specz.hdf5'

def_set_file = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/HSC_grid_settings.pkl'

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'grid_selection'

name = 'GridSelection'

stage_columns: list[str] | None

class rail.stages.HyperbolicMagnitudes

Bases: PhotometryManipulator

Convert a set of classical magnitudes to hyperbolic magnitudes (Lupton et al. 1999). Requires input from the initial stage (HyperbolicSmoothing) to supply optimal values for the smoothing parameters (b).

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
value_columns ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – list of columns that prove photometric measurements (fluxes or magnitudes)
error_columns ([list] default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']) – list of columns with errors corresponding to value_columns (assuming same ordering)
zeropoints ([list] default=[]) – optional list of magnitude zeropoints for value_columns (assuming same ordering, defaults to 0.0)
is_flux ([bool] default=False) – whether the provided quantities are fluxes or magnitudes
input (PqHandle (INPUT))
parameters (PqHandle (INPUT))
output (PqHandle (OUTPUT))

compute(data, parameters, **kwargs)

Main method to call. Outputs hyperbolic magnitudes compuated from a set of smoothing parameters and input catalogue with classical magitudes and their respective errors.

Parameters:

data (PqHandle) – Input table with photometry (magnitudes or flux columns and their respective uncertainties) as defined by the configuration.
parameters (PqHandle) – Table witdh smoothing parameters per photometric band, determined by HyperbolicSmoothing.

Returns:

Output table containting hyperbolic magnitudes and their uncertainties. If the columns in the input table contain a prefix mag_, this output tabel will replace the prefix with hyp_mag_, otherwise the column names will be identical to the input table.

Return type:

PqHandle

entrypoint_function: str | None = 'compute'

inputs = [('input', <class 'rail.core.data.PqHandle'>), ('parameters', <class 'rail.core.data.PqHandle'>)]

interactive_function: str | None = 'hyperbolic_magnitudes'

name = 'HyperbolicMagnitudes'

outputs = [('output', <class 'rail.core.data.PqHandle'>)]

run(): Compute hyperbolic magnitudes and their error based on the parameters determined by HyperbolicSmoothing.

class rail.stages.HyperbolicSmoothing

Bases: PhotometryManipulator

Initial stage to compute hyperbolic magnitudes (Lupton et al. 1999). Estimates the smoothing parameter b that is used by the second stage (HyperbolicMagnitudes) to convert classical to hyperbolic magnitudes.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
value_columns ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – list of columns that prove photometric measurements (fluxes or magnitudes)
error_columns ([list] default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']) – list of columns with errors corresponding to value_columns (assuming same ordering)
zeropoints ([list] default=[]) – optional list of magnitude zeropoints for value_columns (assuming same ordering, defaults to 0.0)
is_flux ([bool] default=False) – whether the provided quantities are fluxes or magnitudes
input (PqHandle (INPUT))
parameters (PqHandle (OUTPUT))

compute(data, **kwargs)

Main method to call. Computes the set of smoothing parameters (b) for an input catalogue with classical photometry and their respective errors. These parameters are required by the follow-up stage HyperbolicMagnitudes and are parsed as tabular data.

Parameters:: data (PqHandle) – Input table with magnitude and magnitude error columns as defined in the configuration.
Returns:: Table with smoothing parameters per photometric band and additional meta data.
Return type:: PqHandle

entrypoint_function: str | None = 'compute'

inputs = [('input', <class 'rail.core.data.PqHandle'>)]

interactive_function: str | None = 'hyperbolic_smoothing'

name = 'HyperbolicSmoothing'

outputs = [('parameters', <class 'rail.core.data.PqHandle'>)]

run(): Computes the smoothing parameter b (see Lupton et al. 1999) per photometric band.

class rail.stages.IGMExtinctionModel

Bases: Noisifier

Degrader that simulates IGM extinction.

Note that the extinction is only applied to u and g bands, assuming that the maximum redshift of the same is < ~3.

Note also that the code assumes the first two input bands are always u and g. These bands are also needed to compute the UV slope.

An initial UV slope of -2 is assumed. There is option to update the UV slope through one iteration, based on the u and g fluxes.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
data_path ([str] default=None) – data_path (str): file path to the FILTER directories. If left to default None it will use the install directory for rail + rail/examples_data/estimation_data/data
filter_list (list] (default=['DC2LSST_u', 'DC2LSST_g', 'DC2LSST_r', 'DC2LSST_i', 'DC2LSST_z', 'DC2LSST_y']))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
redshift_col (str] (default=redshift))
compute_uv_slope ([bool] default=True) – whether to compute the UV slopeIf not, the initial value of -2 will be used
optical_depth_interpolator ([bool] default=True) – whether to precompute optical depth as a functionof wavelength and redshift, and interpolate the grid.Notice that if False, the computation loops over allobjects, hence can take a very long time!
redshift_grid ([list] default=[1.5, 4, 100]) – the redshift grid to interpolate on, enter a list containing:z_min, z_max, number_of_grid. The default values should havea precision that suffice most purpose
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs)

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'igm_extinction_model'

name = 'IGMExtinctionModel'

stage_columns: list[str] | None

class rail.stages.InvRedshiftIncompleteness

Bases: Selector

Degrader that simulates incompleteness with a selection function inversely proportional to redshift.

The survival probability of this selection function is p(z) = min(1, z_p/z), where z_p is the pivot redshift.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
pivot_redshift ([float] (required)) – redshift at which the incompleteness begins
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs)

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'inv_redshift_incompleteness'

name = 'InvRedshiftIncompleteness'

stage_columns: list[str] | None

class rail.stages.KDEBinOverlap

Bases: RailStage

Stage KDEBinOverlap

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
redshift_col ([str] default=redshift) – name of redshift column
bin_name ([str] default=class_id) – Groupname for the tomographic bin index in the hdf5 handle
truth (TableHandle (INPUT))
bin_index (Hdf5Handle (INPUT))
output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'evaluate'

evaluate(bin_index, truth, **kwargs)

Evaluate function for KDEBinOverlap

Parameters:

bin_index (TableLike) – bin_index
truth (TableLike) – truth

Returns:

Output data

Return type:

Hdf5Handle

inputs = [('truth', <class 'rail.core.data.TableHandle'>), ('bin_index', <class 'rail.core.data.Hdf5Handle'>)]

interactive_function: str | None = 'kde_bin_overlap'

name = 'KDEBinOverlap'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.KNearNeighEstimator

Bases: CatEstimator

KNN-based estimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
id_col ([str] default=object_id) – name of the object ID column
redshift_col (str] (default=redshift))
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
ref_band (str] (default=mag_i_lsst))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do Estimator specific initialization

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'k_near_neigh_estimator'

name = 'KNearNeighEstimator'

open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:

tag – Input tag associated to the model
**kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.

Returns:: The object encapsulating the trained model.
Return type:: Any

stage_columns: list[str] | None

class rail.stages.KNearNeighInformer

Bases: CatInformer

Train a KNN-based estimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname (str] (default=photometry))
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
ref_band (str] (default=mag_i_lsst))
redshift_col (str] (default=redshift))
trainfrac ([float] default=0.75) – fraction of training data used to make tree, rest used to set best sigma
seed ([int] default=0) – Random number seed for NN training
sigma_grid_min ([float] default=0.01) – minimum value of sigma for grid check
sigma_grid_max ([float] default=0.075) – maximum value of sigma for grid check
ngrid_sigma ([int] default=10) – number of grid points in sigma check
leaf_size ([int] default=15) – min leaf size for KDTree
nneigh_min ([int] default=3) – int, min number of near neighbors to use for PDF fit
nneigh_max ([int] default=7) – int, max number of near neighbors to use ofr PDF fit
only_colors ([bool] default=False) – if only_colors True, then do not use ref_band mag, only use colors
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Constructor Do CatInformer specific initialization, then check on bands

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'k_near_neigh_informer'

name = 'KNearNeighInformer'

run(): train a KDTree on a fraction of the training data

stage_columns: list[str] | None

class rail.stages.LSSTErrorModel

Bases: PhotoErrorModel

The LSST Error model, defined by peLsstErrorParams and peLsstErrorModel

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'lsst_error_model'

name = 'LSSTErrorModel'

stage_columns: list[str] | None

class rail.stages.LSSTFluxToMagConverter

Bases: RailStage

Utility stage that converts from fluxes to magnitudes

Note, this is hardwired to take parquet files as input and provide hdf5 files as output

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
bands ([list] default=['u', 'g', 'r', 'i', 'z', 'y']) – Names of the bands
flux_name ([str] default={band}_gaap1p0Flux) – Template for band names
flux_err_name ([str] default={band}_gaap1p0FluxErr) – Template for band error column names
mag_name ([str] default=mag_{band}_lsst) – Template for magnitude column names
mag_err_name ([str] default=mag_err_{band}_lsst) – Template for magnitude error column names
copy_cols ([dict] default={}) – Map of other columns to copy
mag_offset ([float] default=31.4) – Magntidue offset value
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

inputs = [('input', <class 'rail.core.data.PqHandle'>)]

interactive_function: str | None = 'lsst_flux_to_mag_converter'

mag_conv = np.float64(0.9210340371976184)

name = 'LSSTFluxToMagConverter'

outputs = [('output', <class 'rail.core.data.PqHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

class rail.stages.LephareEstimator

Bases: CatEstimator

LePhare-base CatEstimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
id_col ([str] default=object_id) – name of the object ID column
redshift_col (str] (default=redshift))
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
ref_band (str] (default=mag_i_lsst))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
lephare_config ([dict] default={}) – The lephare config keymap. If unset we load it from the model.
use_inform_offsets ([bool] default=True) – Use the zero point offsets computed in the inform stage.
posterior_output ([int] default=11) – Which posterior distribution to output.MASS: 0SFR: 1SSFR: 2LDUST: 3LIR: 4AGE: 5COL1: 6COL2: 7MREF: 8MIN_ZG: 9MIN_ZQ: 10BAY_ZG: 11BAY_ZQ: 12
output_keys ([list] default=['Z_BEST', 'CHI_BEST', 'ZQ_BEST', 'CHI_QSO', 'MOD_STAR', 'CHI_STAR']) – The output keys to add to ancil. These must be in the output para file. By default we include the best galaxy and QSO redshift and best star alongside their respective chi squared.
run_dir ([str] default=None) – Override for the LEPHAREWORK directory. If None we load it from the model which is set during the inform stage. This is to facilitate manually moving intermediate files.
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Initialize Estimator

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'lephare_estimator'

lephare_config: dict

name = 'LephareEstimator'

nzbins: int | None

open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:

tag – Input tag associated to the model
**kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.

Returns:: The object encapsulating the trained model.
Return type:: Any

stage_columns: list[str] | None

zmax: float | None

zmin: float | None

class rail.stages.LephareInformer

Bases: CatInformer

Inform stage for LephareEstimator

This class will set templates and filters required for photoz estimation.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
ref_band (str] (default=mag_i_lsst))
redshift_col (str] (default=redshift))
lephare_config ([dict] (default={...})) – The lephare config keymap.
star_config ([dict] default={'LIB_ASCII': 'YES'}) – Star config overrides.
gal_config ([dict] default={'LIB_ASCII': 'YES', 'MOD_EXTINC': '18,26,26,33,26,33,26,33', 'EXTINC_LAW': 'SMC_prevot.dat,SB_calzetti.dat,SB_calzetti_bump1.dat,SB_calzetti_bump2.dat', 'EM_LINES': 'EMP_UV', 'EM_DISPERSION': '0.5,0.75,1.,1.5,2.'}) – Galaxy config overrides.
qso_config ([dict] default={'LIB_ASCII': 'YES', 'MOD_EXTINC': '0,1000', 'EB_V': '0.,0.1,0.2,0.3', 'EXTINC_LAW': 'SB_calzetti.dat'}) – QSO config overrides.
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Init function, init config stuff (COPIED from rail_bpz)

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'lephare_informer'

name = 'LephareInformer'

run()

Run rail_lephare inform stage.

This informer takes the config and templates and makes the inputs required for the run.

In addition to the three lephare stages making the filter, sed, and magnitude libraries we also do some tasks required by all rail inform stages.

stage_columns: list[str] | None

validate(): Check that the inputs actually have the data needed for execution, This is called before the run method. It is an optional stage, meant for checking that the input to the stage is actual in the form and shape needed before an expensive run is executed.

class rail.stages.LineConfusion

Bases: Noisifier

Degrader that simulates emission line confusion.

degrader = LineConfusion(true_wavelen=3727,
                         wrong_wavelen=5007,
                         frac_wrong=0.05)

is a degrader that misidentifies 5% of OII lines (at 3727 angstroms) as OIII lines (at 5007 angstroms), which results in a larger spectroscopic redshift.

Note that when selecting the galaxies for which the lines are confused, the degrader ignores galaxies for which this line confusion would result in a negative redshift, which can occur for low redshift galaxies when wrong_wavelen < true_wavelen.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
true_wavelen ([float] (required)) – wavelength of the true emission line
wrong_wavelen ([float] (required)) – wavelength of the wrong emission line
frac_wrong ([float] (required)) – fraction of galaxies with confused emission lines
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs)

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'line_confusion'

name = 'LineConfusion'

stage_columns: list[str] | None

class rail.stages.MiniSOMInformer

Bases: CatInformer

Summarizer that uses a SOM to construct a weighted sum of spec-z objects in the same SOM cell as each photometric galaxy in order to estimate the overall N(z). This is very related to the NZDir estimator, though that estimator actually reverses this process and looks for photometric neighbors around each spectroscopic galaxy, which can lead to problems if there are photometric galaxies with no nearby spec-z objects (NZDir is not aware that such objects exist and thus can hid biases). Part of the SimpeSOM estimator will be a check for cells which contain photometric objects but do not contain any corresponding training/spec-z objects, those unmatched objects will be flagged for possible removal from the input sample. The inform stage will simply construct a 2D grid SOM using minisom from a large sample of input photometric data and save this as an output. This may be a computationally intensive stage, though it will hopefully be run once and used by the estimate/summarize stage many times without needing to be re-run.

We can make the SOM either with all colors, or one magnitude and N colors, or an arbitrary set of columns. The code includes a flag column_usage to set usage, If set to “colors” it will take the difference of each adjacen pair of columns in bands as the colors. If set to magandcolors it will use these colors plus one magnitude as specified by ref_band. If set to columns then it will take as inputs all of the columns specified by bands (they can be magnitudes, colors, or any other input specified by the user). NOTE: any custom bands parameters must have an accompanying nondetect_val dictionary that will replace nondetections with the nondetect_val values!

This will make a pickle file containing the minisom SOM object that will be used by the estimation/summarization stage

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname (str] (default=photometry))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
ref_band (str] (default=mag_i_lsst))
column_usage ([str] default=magandcolors) – switch for how SOM uses columns, valid values are ‘colors’, ‘magandcolors’, and ‘columns’
seed ([int] default=0) – Random number seed
m_dim ([int] default=31) – number of cells in SOM y dimension
n_dim ([int] default=31) – number of cells in SOM x dimension
som_sigma ([float] default=1.5) – sigma param in SOM training
som_learning_rate ([float] default=0.5) – SOM learning rate
som_iterations ([int] default=10000) – number of iterations in SOM training
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do Informer specific initialization

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'mini_som_informer'

name = 'MiniSOMInformer'

run(): Build a SOM from photometric data NOT spectroscopic data!

stage_columns: list[str] | None

class rail.stages.MiniSOMSummarizer

Bases: SZPZSummarizer

Quick implementation of a SOM-based summarizer that constructs and N(z) estimate via a weighted sum of the empirical N(z) consisting of the normalized histogram of spec-z values contained in the same SOM cell as each photometric galaxy. There are some general guidelines to choosing the geometry and number of total cells in the SOM. This paper: http://www.giscience2010.org/pdfs/paper_230.pdf recommends 5*sqrt(num rows * num data columns) as a rough guideline. Some authors state that a SOM with one dimension roughly twice as long as the other are better, while others find that square SOMs with equal X and Y dimensions are best, the user can set the dimensions using the n_dim and m_dim parameters. For more discussion on SOMs and photo-z calibration, see the KiDS paper on the topic: http://arxiv.org/abs/1909.09632 particularly the appendices. Note that several parameters are stored in the model file, e.g. the columns used. This ensures that the same columns used in constructing the SOM are used when finding the winning SOM cell with the test data. Two additional files are also written out: cellid_output outputs the ‘winning’ SOM cell for each photometric galaxy, in both raveled and 2D SOM cell coordinates. If the objectID or galaxy_id is present they will also be included in this file, if not the coordinates will be written in the same order in which the data is read in. uncovered_cell_file outputs the raveled cell IDs of cells that contain photometric galaxies but no corresponding spectroscopic objects, these objects should be removed from the sample as they cannot be accounted for properly in the summarizer. Some iteration on data cuts may be necessary to remove/mitigate these ‘uncovered’ objects.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
hdf5_groupname (str] (default=photometry))
redshift_col (str] (default=redshift))
objid_name ([str] default=) – A parameter
spec_groupname ([str] default=photometry) – name of hdf5 group for spec data, if None, then set to ‘’
seed ([int] default=12345) – random seed
phot_weightcol ([str] default=) – name of photometry weight, if present
spec_weightcol ([str] default=) – name of specz weight col, if present
nsamples ([int] default=20) – number of bootstrap samples to generate
input (TableHandle (INPUT))
spec_input (TableHandle (INPUT))
model (ModelHandle (INPUT))
output (QPHandle (OUTPUT))
single_NZ (QPHandle (OUTPUT))
cellid_output (TableHandle (OUTPUT))
uncovered_cell_file (TableHandle (OUTPUT))

__init__(args, **kwargs): Initialize Estimator that can sample galaxy data.

entrypoint_function: str | None = 'summarize'

interactive_function: str | None = 'mini_som_summarizer'

name = 'MiniSOMSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>), ('cellid_output', <class 'rail.core.data.TableHandle'>), ('uncovered_cell_file', <class 'rail.core.data.TableHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None

class rail.stages.Modeler

Bases: RailStage

Base class for creating a model of redshift and photometry.

__init__(args, **kwargs)

Initialize Modeler

Parameters:

args (Any)
kwargs (Any)

entrypoint_function: str | None = 'fit_model'

fit_model(input_data, **kwargs)

Produce a creation model from which photometry and redshifts can be generated.

Parameters:

input_data (DataHandle) –

???

Returns:

This will definitely be a wrapper around a File, but the filetype and format depend entirely on the modeling approach

Return type:

ModelHandle

inputs = [('input', <class 'rail.core.data.DataHandle'>)]

name = 'Modeler'

outputs = [('model', <class 'rail.core.data.ModelHandle'>)]

stage_columns: list[str] | None

class rail.stages.NZDirInformer

Bases: CatInformer

Quick implementation of an NZ Estimator that creates weights for each input object using sklearn’s NearestNeighbors. Very basic, we can probably create a more sophisticated SOM-based DIR method in the future. This inform stage just creates a nearneigh model of the spec-z data and some distances to N-th neighbor that will be used in the estimate stage.

This will create model a dictionary of the nearest neighbor model and params used by estimate

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
usecols ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – columns from sz_data for Neighbor calculation
n_neigh ([int] default=10) – number of neighbors to use
kalgo ([str] default=kd_tree) – Neighbor algorithm to use
kmetric ([str] default=euclidean) – Knn metric to use
szname ([str] default=redshift) – name of specz column in sz_data
szweightcol ([str] default=) – name of sz weight column
distance_delta ([float] default=1e-06) – padding for distance calculation
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

bands = ['u', 'g', 'r', 'i', 'z', 'y']

default_usecols = ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'nz_dir_informer'

name = 'NZDirInformer'

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None

class rail.stages.NZDirSummarizer

Bases: CatEstimator

Quick implementation of a summarizer that creates weights for each input object using sklearn’s NearestNeighbors. Very basic, we can probably create a more sophisticated SOM-based DIR method in the future

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid
zmax ([float] default=3.0) – The maximum redshift of the z grid
nzbins ([int] default=301) – The number of gridpoints in the z grid
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
seed ([int] default=87) – random seed
usecols ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – columns from sz_date for Neighbor calculation
leafsize ([int] default=40) – leaf size for testdata KDTree
phot_weightcol ([str] default=) – name of photometry weight, if present
nsamples ([int] default=20) – number of bootstrap samples to generate
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))
single_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs): Initialize Estimator

bands = ['u', 'g', 'r', 'i', 'z', 'y']

default_usecols = ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']

entrypoint_function: str | None = 'estimate'

initialize_handle(tag, data, npdf)

interactive_function: str | None = 'nz_dir_summarizer'

join_histograms()

name = 'NZDirSummarizer'

open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:

tag – Input tag associated to the model
**kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.

Returns:: The object encapsulating the trained model.
Return type:: Any

outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None

class rail.stages.NaiveStackInformer

Bases: PzInformer

Placeholder Informer

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
input (QPHandle (INPUT))
truth (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'naive_stack_informer'

model_handle: ModelHandle | None

name = 'NaiveStackInformer'

stage_columns: list[str] | None

class rail.stages.NaiveStackMaskedSummarizer

Bases: NaiveStackSummarizer

Stage NaiveStackMaskedSummarizer

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
seed ([int] default=87) – random seed
n_samples ([int] default=1000) – Number of sample distributions to create
selected_bin ([int] default=-1) – bin to use
input (QPHandle (INPUT))
tomography_bins (TableHandle (INPUT))
output (QPHandle (OUTPUT))
single_NZ (QPHandle (OUTPUT))

entrypoint_function: str | None = 'summarize'

inputs = [('input', <class 'rail.core.data.QPHandle'>), ('tomography_bins', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'naive_stack_masked_summarizer'

name = 'NaiveStackMaskedSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]

stage_columns: list[str] | None

summarize(input_data, tomo_bins=None, **kwargs)

Override the Summarizer.summarize() method to take tomo bins as an additional input

Parameters:

input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it
tomo_bins (TableLike | None, optional) – Tomographic bins file, by default None

Returns:

Ensemble with n(z), and any ancillary data

Return type:

QPHandle

zgrid: ndarray | None

class rail.stages.NaiveStackSummarizer

Bases: PZSummarizer

Summarizer which stacks individual P(z)

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
seed ([int] default=87) – random seed
n_samples ([int] default=1000) – Number of sample distributions to create
input (QPHandle (INPUT))
output (QPHandle (OUTPUT))
single_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'summarize'

inputs = [('input', <class 'rail.core.data.QPHandle'>)]

interactive_function: str | None = 'naive_stack_summarizer'

name = 'NaiveStackSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

summarize(input_data, **kwargs)

Summarizer for NaiveStack which returns multiple items

Parameters:: input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it
Returns:: Ensemble with n(z), and any ancillary data Return type depends on output_mode
Return type:: QPHandle | dict[str, QPHandle]

zgrid: ndarray | None

class rail.stages.Noisifier

Bases: RailStage

Base class Noisifier, which adds noise to the input catalog

Noisifier take “input” data in the form of pandas dataframes in Parquet files and provide as “output” another pandas dataframes written to Parquet files.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

inputs = [('input', <class 'rail.core.data.PqHandle'>)]

name = 'Noisifier'

outputs = [('output', <class 'rail.core.data.PqHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.ObsCondition

Bases: Noisifier

Photometric errors based on observation conditions

This degrader calculates spatially-varying photometric errors using input survey condition maps. The error is based on the LSSTErrorModel from the PhotErr python package.

mask: str, optional
    Path to the mask covering the survey
    footprint in HEALPIX format. Notice that
    all negative values will be set to zero.
weight: str, optional
    Path to the weights HEALPIX format, used
    to assign sample galaxies to pixels. Default
    is weight="", which uses uniform weighting.
    tot_nVis_flag: bool, optional
    If any map for nVisYr are provided, this flag
    indicates whether the map shows the total number of
    visits in nYrObs (tot_nVis_flag=True), or the average
    number of visits per year (tot_nVis_flag=False). The
    default is set to True.
map_dict: dict, optional
    A dictionary that contains the paths to the
    survey condition maps in HEALPIX format. This dictionary
    uses the same arguments as LSSTErrorModel (from PhotErr).
    The following arguments, if supplied, may contain either
    a single number (as in the case of LSSTErrorModel), or a path:
    [m5, nVisYr, airmass, gamma, msky, theta, km, tvis, EBV]
    For the following keys:
    [m5, nVisYr, gamma, msky, theta, km]
    numbers/paths for specific bands must be passed.
    Example:
    {"m5": {"u": path, ...}, "theta": {"u": path, ...},}
    Other LSSTErrorModel parameters can also be passed
    in this dictionary (e.g. a necessary one may be [nYrObs]
    or the survey condition maps).
    If any argument is not passed, the default value in
    PhotErr's LsstErrorModel is adopted.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
nside ([int] default=128) – nside for the input maps in HEALPIX format.
mask ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/creation/degraders/../../examples_data/creation_data/data/survey_conditions/DC2-mask-neg-nside-128.fits) – mask for the input maps in HEALPIX format.
weight ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/creation/degraders/../../examples_data/creation_data/data/survey_conditions/DC2-dr6-galcounts-i20-i25.3-nside-128.fits) – weight for assigning pixels to galaxies in HEALPIX format.
tot_nVis_flag ([bool] default=True) – flag indicating whether nVisYr is the total or average per year if supplied.
random_seed ([int] default=42) – random seed for reproducibility
map_dict ([dict] default={'m5': {'i': '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/creation/degraders/../../examples_data/creation_data/data/survey_conditions/minion_1016_dc2_Median_fiveSigmaDepth_i_and_nightlt1825_HEAL.fits'}, 'nYrObs': 5.0}) – dictionary containing the paths to the survey condition maps and/or additional LSSTErrorModel parameters.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

BAND_A_EBV = {'g': 3.64, 'i': 2.06, 'r': 2.7, 'u': 4.81, 'y': 1.31, 'z': 1.58}

STANDARD_BANDS = ['u', 'g', 'r', 'i', 'z', 'y']

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

apply_galactic_extinction(pixel, pixel_cat)

MW extinction reddening of the magnitudes

Parameters:

pixel (int)
pixel_cat (DataFrame)

Return type:

DataFrame

assign_pixels(catalog)

assign the pixels to the input catalog check if catalogue contains position information; if so, assign according to ra, dec; else, assign randomly.

Parameters:: catalog (DataFrame)
Return type:: DataFrame

entrypoint_function: str | None = '__call__'

get_pixel_conditions(pixel)

get the map values at given pixel output is a dictionary that only contains the LSSTErrorModel keys

Parameters:: pixel (int)
Return type:: dict

interactive_function: str | None = 'obs_condition'

name = 'ObsCondition'

stage_columns: list[str] | None

class rail.stages.OldEvaluator

Bases: RailStage

Evaluate the performance of a photo-Z estimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
pit_metrics ([str] default=all) – PIT-based metrics to include
point_metrics ([str] default=all) – Point-estimate metrics to include
hdf5_groupname ([str] default=) – name of hdf5 group for data, if None, then set to ‘’
do_cde ([bool] default=True) – Evaluate CDE Metric
redshift_col ([str] default=redshift) – name of redshift column
input (QPHandle (INPUT))
truth (Hdf5Handle (INPUT))
output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'evaluate'

evaluate(data, truth, **kwargs)

Evaluate the performance of an estimator

This will attach the input data and truth to this Evaluator (for introspection and provenance tracking). Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes. The run() method will need to register the data that it creates to this Estimator by using self.add_data(‘output’, output_data).

Parameters:

data (qp.Ensemble) – The sample to evaluate
truth (Any) – Table with the truth information

Returns:

The evaluation metrics

Return type:

Hdf5Handle

inputs = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.Hdf5Handle'>)]

interactive_function: str | None = 'old_evaluator'

name = 'OldEvaluator'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run()

Run method Evaluate all the metrics and put them into a table .. rubric:: Notes

Get the input data from the data store under this stages ‘input’ tag Get the truth data from the data store under this stages ‘truth’ tag Puts the data into the data store under this stages ‘output’ tag

Return type:: None

stage_columns: list[str] | None

class rail.stages.PZClassifier

Bases: RailStage

The base class for assigning classes (tomographic bins) to per-galaxy PZ estimates.

PZClassifier takes as “input” a qp.Ensemble with per-galaxy PDFs, and provides as “output” tabular data which can be appended to the catalogue.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
input (QPHandle (INPUT))
output (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

Initialize the PZClassifier.

Parameters:

args (Any)
kwargs (Any)

Return type:

None

classify(input_data, **kwargs)

The main run method for the classifier, should be implemented in the specific subclass.

This will attach the input_data to this PZClassifier (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).

The run() method relies on the _process_chunk() method, which should be implemented by subclasses to perform the actual classification on each chunk of data. The results from each chunk are then combined in the _finalize_run() method. (Alternatively, override run() in a subclass to perform the classification without parallelization.)

Finally, this will return a TableHandle providing access to that output data.

Parameters:: input_data (qp.Ensemble) – Per-galaxy p(z), and any ancilary data associated with it
Returns:: Class assignment for each galaxy, typically in the form of a dictionary with IDs and class labels.
Return type:: TableHandle

entrypoint_function: str | None = 'classify'

inputs = [('input', <class 'rail.core.data.QPHandle'>)]

name = 'PZClassifier'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run()

Processes the input data in chunks and performs classification.

This method iterates over chunks of the input data, calling the _process_chunk method for each chunk to perform the actual classification.

The _process_chunk method should be implemented by subclasses to define the specific classification logic.

Return type:: None

stage_columns: list[str] | None

class rail.stages.PZFlowEstimator

Bases: CatEstimator

CatEstimator which uses PZFlow

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid
zmax ([float] default=3.0) – The maximum redshift of the z grid
nzbins ([int] default=301) – The number of gridpoints in the z grid
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
flow_seed ([int] default=0) – seed for flow
ref_column_name ([str] default=mag_i_lsst) – name for reference column
column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow
mag_limits ([dict] default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}) – 1 sigma mag limits
include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)
error_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns
n_error_samples ([int] default=1000) – umber of error samples in marginalization
redshift_column_name ([str] default=redshift) – name of redshift column
model (FlowHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Initialize Estimator

entrypoint_function: str | None = 'estimate'

inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'pz_flow_estimator'

name = 'PZFlowEstimator'

stage_columns: list[str] | None

class rail.stages.PZFlowInformer

Bases: CatInformer

Subclass to train a pzflow-based estimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – min z
zmax ([float] default=3.0) – max_z
nzbins ([int] default=301) – num z bins
flow_seed ([int] default=0) – seed for flow
ref_column_name ([str] default=mag_i_lsst) – name for reference column
column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow
mag_limits ([dict] default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}) – 1 sigma mag limits
include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)
error_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns
n_error_samples ([int] default=1000) – umber of error samples in marginalization
soft_sharpness ([int] default=10) – sharpening paremeter for SoftPlus
soft_idx_col ([int] default=0) – index column for SoftPlus
redshift_column_name ([str] default=redshift) – name of redshift column
num_training_epochs ([int] default=50) – number flow training epochs
input (TableHandle (INPUT))
model (FlowHandle (OUTPUT))

__init__(args, **kwargs): Constructor, build the CatInformer, then do PZFlow specific setup

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'pz_flow_informer'

name = 'PZFlowInformer'

outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]

run(): train a flow based on the training data This is mostly based off of the pzflow example notebook

stage_columns: list[str] | None

class rail.stages.PZSummarizer

Bases: RailStage

The base class for classes that go from per-galaxy PZ estimates to ensemble NZ estimates

PZSummarizer take as “input” a qp.Ensemble with per-galaxy PDFs, and provide as “output” a QPEnsemble, with per-ensemble n(z).

entrypoint_function: str | None = 'summarize'

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.QPHandle'>)]

name = 'PZtoNZSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>)]

stage_columns: list[str] | None

summarize(input_data, **kwargs)

The main run method for the summarization, should be implemented in the specific subclass.

This will attach the input_data to this PZtoNZSummarizer (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Estimator by using self.add_data(‘output’, output_data).

Finally, this will return a QPHandle providing access to that output data.

Parameters:: input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it
Returns:: Ensemble with n(z), and any ancillary data
Return type:: QPHandle

class rail.stages.PhotoErrorModel

Bases: Noisifier

The Base Model for photometric errors.

This is a wrapper around the error model from PhotErr. The parameter docstring below is dynamically added by the installed version of PhotErr:

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'photo_error_model'

name = 'PhotoErrorModel'

reload_pars(args): This is needed b/c the parameters are dynamically defined, so we have to reload them _after_ then have been defined

set_params(peparams): Set the photometric error parameters from photerr to the ceci config

stage_columns: list[str] | None

class rail.stages.PointEstHistInformer

Bases: PzInformer

Placeholder Informer

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
input (QPHandle (INPUT))
truth (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'point_est_hist_informer'

model_handle: ModelHandle | None

name = 'PointEstHistInformer'

stage_columns: list[str] | None

class rail.stages.PointEstHistMaskedSummarizer

Bases: PointEstHistSummarizer

Summarizer which simply histograms a point estimate

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
seed ([int] default=87) – random seed
point_estimate_key ([str] default=zmode) – Which point estimate to use
n_samples ([int] default=1000) – Number of sample distributions to return
selected_bin ([int] default=-1) – bin to use
input (QPHandle (INPUT))
tomography_bins (TableHandle (INPUT))
output (QPHandle (OUTPUT))
single_NZ (QPHandle (OUTPUT))

bincents: ndarray | None

entrypoint_function: str | None = 'summarize'

inputs = [('input', <class 'rail.core.data.QPHandle'>), ('tomography_bins', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'point_est_hist_masked_summarizer'

name = 'PointEstHistMaskedSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]

stage_columns: list[str] | None

summarize(input_data, tomo_bins=None, **kwargs)

Override the Summarizer.summarize() method to take tomo bins as an additional input

Parameters:

input_data (qp.Ensemble) – Per-galaxy p(z), and any ancilary data associated with it
tomo_bins (TableLike | None, optional) – Tomographic bins file, by default None

Returns:

Ensemble with n(z), and any ancilary data

Return type:

QPHandle

zgrid: ndarray | None

class rail.stages.PointEstHistSummarizer

Bases: PZSummarizer

Summarizer which simply histograms a point estimate

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
seed ([int] default=87) – random seed
point_estimate_key ([str] default=zmode) – Which point estimate to use
n_samples ([int] default=1000) – Number of sample distributions to return
input (QPHandle (INPUT))
output (QPHandle (OUTPUT))
single_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:

args (Any)
kwargs (Any)

Return type:

None

bincents: ndarray | None

entrypoint_function: str | None = 'summarize'

inputs = [('input', <class 'rail.core.data.QPHandle'>)]

interactive_function: str | None = 'point_est_hist_summarizer'

name = 'PointEstHistSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

zgrid: ndarray | None

class rail.stages.PointToPointBinnedEvaluator

Bases: Evaluator

Evaluate the performance of a photo-z estimator against reference point estimate

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
metrics ([list] default=[]) – The metrics you want to evaluate.
exclude_metrics ([list] default=[]) – List of metrics to exclude
metric_config ([dict] default={}) – configuration of individual_metrics
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
seed ([float] default=None) – Random seed value to use for reproducible results.
force_exact ([bool] default=True) – Force the exact calculation. This will not allow parallelization
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
reference_dictionary_key ([str] default=redshift) – The key in the truth dictionary where the redshift data is stored.
point_estimate_key ([str] default=zmode) – The key in the point estimate table.
bin_col ([str] default=redshift) – The column metrics are binned by
bin_min ([float] default=0.0) – The mininum value of the binning edge
bin_max ([float] default=3.0) – The maximum value of the binning edge
nbin ([int] default=10) – The mininum value of the binning edge
input (QPHandle (INPUT))
truth (TableHandle (INPUT))
output (Hdf5Handle (OUTPUT))
summary (Hdf5Handle (OUTPUT))
single_distribution_summary (QPDictHandle (OUTPUT))

entrypoint_function: str | None = 'evaluate'

inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'point_to_point_binned_evaluator'

metric_base_class: alias of PointToPointMetric

name = 'PointToPointBinnedEvaluator'

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.PointToPointEvaluator

Bases: Evaluator

Evaluate the performance of a photo-z estimator against reference point estimate

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
metrics ([list] default=[]) – The metrics you want to evaluate.
exclude_metrics ([list] default=[]) – List of metrics to exclude
metric_config ([dict] default={}) – configuration of individual_metrics
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
seed ([float] default=None) – Random seed value to use for reproducible results.
force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
reference_dictionary_key ([str] default=redshift) – The key in the truth dictionary where the redshift data is stored.
point_estimate_key ([str] default=zmode) – The key in the point estimate table.
input (QPHandle (INPUT))
truth (TableHandle (INPUT))
output (Hdf5Handle (OUTPUT))
summary (Hdf5Handle (OUTPUT))
single_distribution_summary (QPDictHandle (OUTPUT))

entrypoint_function: str | None = 'evaluate'

inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'point_to_point_evaluator'

metric_base_class: alias of PointToPointMetric

name = 'PointToPointEvaluator'

stage_columns: list[str] | None

class rail.stages.PosteriorCalculator

Bases: RailStage

Base class for object that calculates the posterior distribution of a particular field in a table of photometric data (typically the redshift).

The posteriors will be contained in a qp Ensemble.

__init__(args, **kwargs)

Initialize PosteriorCalculator

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'get_posterior'

get_posterior(input_data, **kwargs)

Return posteriors for the given column.

This is a method for running a Creator in interactive mode. In pipeline mode, the subclass run method will be called by itself.

Parameters:

input_data (TableLike) – A table of the galaxies for which posteriors are calculated
**kwargs (Any) – Used to update configuration

Returns:

Posterior Estimate

Return type:

QPHandle

Notes

This will put the data argument input this Stages the DataStore using this stages input tag.

This will put the additional functional arguments into this Stages configuration data.

It will then call self.run() and return the QPHandle associated to the output tag.

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]

name = 'PosteriorCalculator'

outputs = [('output', <class 'rail.core.data.QPHandle'>)]

stage_columns: list[str] | None

class rail.stages.PzEstimator

Bases: RailStage, PointEstimationMixin

The base class for making photo-z posterior estimates from other pz inputs

Estimators use a generic “model”, the details of which depends on the sub-class.

Estimators take as “input” a QPEnsemble, with other estimates and provide as “output” a QPEnsemble, with per-object p(z).

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
model (ModelHandle (INPUT))
input (QPHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'estimate'

estimate(input_data, **kwargs)

The main interface method for the photo-z estimation

This will attach the input data (defined in inputs as “input”) to this Estimator (for introspection and provenance tracking). Then call the run(), validate(), and finalize() methods.

The run method will call _process_chunk(), which needs to be implemented in the subclass, to process input data in batches. See RandomGaussEstimator for a simple example.

Finally, this will return a QPHandle for access to that output data.

Parameters:: input_data (QPHandle) – A dictionary of all input data
Returns:: Handle providing access to QP ensemble with output data
Return type:: QPHandle

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.QPHandle'>)]

name = 'PzEstimator'

outputs = [('output', <class 'rail.core.data.QPHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.PzInformer

Bases: RailStage

The base class for informing models used to make photo-z data products from existing ensembles of p(z) distributions.

PzInformer can use a generic “model”, the details of which depends on the sub-class. Some summarizer will have associated PzInformer classes, which can be used to inform those models.

(Note, “Inform” is more generic than “Train” as it also applies to algorithms that are template-based rather than machine learning-based.)

PzInformer will produce as output a generic “model”, the details of which depends on the sub-class.

They take as “input” a qp.Ensemble of per-galaxy p(z) data, which is used to “inform” the model.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
input (QPHandle (INPUT))
truth (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Informer that can inform models for redshift estimation

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'inform'

inform(training_data='None', truth_data='None', **kwargs)

The main interface method for Informers

This will attach the input_data to this Informer (for introspection and provenance tracking).

Then it will call the run(), validate() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the model that it creates to this Estimator by using self.add_data(‘model’, model).

Finally, this will return a ModelHandle providing access to the trained model.

Parameters:

training_data (qp.Ensemble | str, optional) – Per-galaxy p(z), and any ancilary data associated with it, by default “None”
truth_data (TableLike | str, optional) – Table with the true redshifts, by default “None”

Returns:

Handle providing access to trained model

Return type:

dict[str, ModelHandle]

inputs = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.TableHandle'>)]

model_handle: ModelHandle | None

name = 'PzInformer'

outputs = [('model', <class 'rail.core.data.ModelHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.QuantityCut

Bases: Selector

Degrader that applies a cut to the given columns.

Note that if a galaxy fails any of the cuts on any one of its columns, that galaxy is removed from the sample.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
cuts ([dict] (required)) – Cuts to apply
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor.

Performs standard Degrader initialization as well as defining the cuts to be applied.

Parameters:

args (Any)
kwargs (Any)

Return type:

None

cuts: dict | None

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'quantity_cut'

name = 'QuantityCut'

set_cuts(cuts)

Defines the cuts to be applied.

Parameters:: cuts (dict) – A dictionary of cuts to make on the data
Return type:: None

Notes

The cut keys should be the names of columns you wish to make cuts on.

The cut values should be either: - a number, which is the maximum value. E.g. if the dictionary contains “i”: 25, then values of i > 25 are cut from the sample. - an iterable, which is the range of acceptable values. E.g. if the dictionary contains “redshift”: (1.5, 2.3), then redshifts outside that range are cut from the sample.

stage_columns: list[str] | None

class rail.stages.RandomForestClassifier

Bases: CatClassifier

Classifier that assigns tomographic bins based on random forest method

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
id_name ([str] default=) – Column name for the object ID in the input data, if empty the row index is used as the ID.
class_bands ([list] default=['r', 'i', 'z']) – Which bands to use for classification
bands ([dict] default={'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst'}) – column names for the the bands
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'classify'

interactive_function: str | None = 'random_forest_classifier'

model: ModelLike | None

name = 'RandomForestClassifier'

open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:

tag – Input tag associated to the model
**kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.

Returns:: The object encapsulating the trained model.
Return type:: Any

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run(): Apply the classifier to the measured magnitudes

stage_columns: list[str] | None

class rail.stages.RandomForestInformer

Bases: CatInformer

Train the random forest classifier

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
class_bands ([list] default=['r', 'i', 'z']) – Which bands to use for classification
bands ([dict] default={'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst'}) – column names for the the bands
redshift_col ([str] default=sz) – Redshift column names
bin_edges ([list] default=[0, 0.5, 1.0]) – Binning for training data
random_seed ([int] (required)) – random seed
no_assign ([int] default=-99) – Value for no assignment flag
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'random_forest_informer'

name = 'RandomForestInformer'

outputs = [('model', <class 'rail.core.data.ModelHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None

class rail.stages.RandomGaussEstimator

Bases: CatEstimator

Random CatEstimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
rand_width ([float] default=0.025) – ad hock width of PDF
seed ([int] default=87) – random seed
column_name ([str] default=mag_i_lsst) – name of a column that has the correct number of galaxies to find length of
input (TableHandle (INPUT))
model (ModelHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do CatEstimator specific initialization

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'estimate'

inputs = [('input', <class 'rail.core.data.TableHandle'>), ('model', <class 'rail.core.data.ModelHandle'>)]

interactive_function: str | None = 'random_gauss_estimator'

name = 'RandomGaussEstimator'

stage_columns: list[str] | None

validate()

Validation which checks if the required column names by the stage exist in the data

Return type:: None

class rail.stages.RandomGaussInformer

Bases: CatInformer

Placeholder Informer

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'random_gauss_informer'

name = 'RandomGaussInformer'

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.Reddener

Bases: DustMapBase

Utility stage that does reddening

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
ra_name ([str] default=ra) – Name of the RA column
dec_name ([str] default=dec) – Name of the DEC column
mag_name ([str] default=mag_{band}_lsst) – Template for the magnitude columns
band_a_env (dict] (default={'mag_u_lsst': 4.81, 'mag_g_lsst': 3.64, 'mag_r_lsst': 2.7, 'mag_i_lsst': 2.06, 'mag_z_lsst': 1.58, 'mag_y_lsst': 1.31}))
dustmap_name ([str] default=sfd) – Name of the dustmap in question
dustmap_dir ([str] (required)) – Directory with dustmaps
copy_cols ([list] default=[]) – Additional columns to copy
copy_all_cols ([bool] default=False) – Copy all the columns
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'reddener'

name = 'Reddener'

class rail.stages.RomanDeepErrorModel

Bases: PhotoErrorModel

The Roman Deep Error model, defined by peRomanDeepErrorParams and peRomanDeepErrorModel

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'roman_deep_error_model'

name = 'RomanDeepErrorModel'

stage_columns: list[str] | None

class rail.stages.RomanErrorModel

Bases: PhotoErrorModel

The Roman Error model, defined by peRomanErrorParams and peRomanErrorModel

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'roman_error_model'

name = 'RomanErrorModel'

stage_columns: list[str] | None

class rail.stages.RomanMediumErrorModel

Bases: PhotoErrorModel

The Roman Medium Error model, defined by peRomanMediumErrorParams and peRomanMediumErrorModel

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'roman_medium_error_model'

name = 'RomanMediumErrorModel'

stage_columns: list[str] | None

class rail.stages.RomanUltraDeepErrorModel

Bases: PhotoErrorModel

The Roman UltraDeep Error model, defined by peRomanUltraDeepErrorParams and peRomanUltraDeepErrorModel

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'roman_ultra_deep_error_model'

name = 'RomanUltraDeepErrorModel'

stage_columns: list[str] | None

class rail.stages.RomanWideErrorModel

Bases: PhotoErrorModel

The Roman WideError model, defined by peRomanWideErrorParams and peRomanWideErrorModel

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'roman_wide_error_model'

name = 'RomanWideErrorModel'

stage_columns: list[str] | None

class rail.stages.RowSelector

Bases: RailStage

Utility Stage that sub-selects rows from a table by index

This operates on pandas dataframs in parquet files.

2. In short, this does: output_data = input_data[self.config.start_row:self.config.stop_row]

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
start_row ([int] (required)) – starting row number
stop_row ([int] (required)) – Stoppig row number
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

inputs = [('input', <class 'rail.core.data.PqHandle'>)]

interactive_function: str | None = 'row_selector'

name = 'RowSelector'

outputs = [('output', <class 'rail.core.data.PqHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

class rail.stages.SOMSpecSelector

Bases: Selector

Class that creates a specz sample by training a SOM on data with spec-z, classifying all galaxies from a larger sample via the SOM, then selecting the same number of galaxies in each SOM cell as there are in the specz sample. If fewer galaxies are available in the large sample for a cell, it just takes as many as possible, so you can still mismatch the distribution numbers, i.e. if you have a lot of bright galaxies with speczs from a really wide survey like SDSS and the second dataset does not have the same areal coverage, then there may not be enough bright objects in the second dataset to select, so you will end up with fewer.

For the columns used to construct the SOM, there are two sets of columns, noncolor_cols is a config option where you supply a list of columns that will be used directly in the SOM, e.g. redshift, i-magnitude, etc… color_cols, on the other hand, is a config parameter where the user supplies an ordered list of columns that will be differenced before being used as SOM inputs, e.g. if you supply [‘u’, ‘g’,’r’] then a function in the code will compute u-g and g-r and use those in SOM construction. The code combines the noncolor_cols and color_cols features and all are used in construction of the SOM.

As this degrader inherits from Selector, it simply computes a mask, the Selector parent class code will perform the masking, and will return the final dataset that mimics the input reference sample.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
nondetect_val (float] (default=99.0))
noncolor_cols ([list] default=['i', 'redshift']) – data columns used for SOM, can be a single band ifyou will also be using colordata in ‘color_cols’, or can be as many as you want
noncolor_nondet ([list] default=[28.62, -1.0]) – list of nondetect replacement values for the non-color cols
color_cols ([list] default=['u', 'g', 'r', 'i', 'z', 'y']) – columns that will be differenced to make colors. This will be done in order, so put in increasing WL order
color_nondet ([list] default=[27.79, 29.04, 29.06, 28.62, 27.98, 27.05]) – list of nondetect replacement vals for color columns
som_size ([list] default=[32, 32]) – tuple containing the size (x, y) of the SOM
n_epochs ([int] default=10) – number of training epochs.
spec_data (TableHandle (INPUT))
input (TableHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'

inputs = [('spec_data', <class 'rail.core.data.TableHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'som_spec_selector'

make_data_selection(df): make the data to train the som or input to som

name = 'SOMSpecSelector'

stage_columns: list[str] | None

class rail.stages.SOMocluInformer

Bases: CatInformer

Summarizer that uses a SOM to construct a weighted sum of spec-z objects in the same SOM cell as each photometric galaxy in order to estimate the overall N(z). This is very related to the NZDir estimator, though that estimator actually reverses this process and looks for photometric neighbors around each spectroscopic galaxy, which can lead to problems if there are photometric galaxies with no nearby spec-z objects (NZDir is not aware that such objects exist and thus can hid biases).

We apply somoclu package (https://somoclu.readthedocs.io/) to train the SOM.

Part of the SOM estimator will be a check for cells which contain photometric objects but do not contain any corresponding training/spec-z objects, those unmatched objects will be flagged for possible removal from the input sample. The inform stage will simply construct a 2D grid SOM using somoclu from a large sample of input photometric data and save this as an output. This may be a computationally intensive stage, though it will hopefully be run once and used by the estimate/summarize stage many times without needing to be re-run.

We can make the SOM either with all colors, or one magnitude and N colors, or an arbitrary set of columns. The code includes a flag column_usage to set usage, If set to “colors” it will take the difference of each adjacen pair of columns in bands as the colors. If set to magandcolors it will use these colors plus one magnitude as specified by ref_band. If set to columns then it will take as inputs all of the columns specified by bands (they can be magnitudes, colors, or any other input specified by the user). NOTE: any custom bands parameters must have an accompanying nondetect_val dictionary that will replace nondetections with the nondetect_val values!

This creates a pickle file containing the somoclu SOM object that will be used by the estimation/summarization stage

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname (str] (default=photometry))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))
ref_band (str] (default=mag_i_lsst))
redshift_col (str] (default=redshift))
column_usage ([str] default=magandcolors) – switch for how SOM uses columns, valid values are ‘colors’,’magandcolors’, and ‘mags’
seed ([int] default=0) – Random number seed
n_rows ([int] default=31) – number of cells in SOM y dimension
n_columns ([int] default=31) – number of cells in SOM x dimension
gridtype ([str] default=rectangular) – Optional parameter to specify the grid form of the nodes:* ‘rectangular’: rectangular neurons (default)* ‘hexagonal’: hexagonal neurons
n_epochs ([int] default=10) – number of training epochs.
initialization ([str] default=pca) – method of initializing the SOM:* ‘pca’: principal componant analysis (default)* ‘random’ randomly initialize the SOM
maptype ([str] default=planar) – Optional parameter to specify the map topology:* ‘planar’: Planar map (default)* ‘toroid’: Toroid map
std_coeff ([float] default=1.5) – Optional parameter to set the coefficient in the Gaussianneighborhood function exp(-||x-y||^2/(2*(coeff*radius)^2))Default: 1.5
som_learning_rate ([float] default=0.5) – Initial SOM learning rate (scale0 param in Somoclu)
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do Informer specific initialization

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'somoclu_informer'

name = 'SOMocluInformer'

run(): Build a SOM from photometric data NOT spectroscopic data!

stage_columns: list[str] | None

class rail.stages.SOMocluSummarizer

Bases: SZPZSummarizer

Quick implementation of a SOM-based summarizer. It will group a pre-trained SOM into hierarchical clusters and assign a galaxy sample into SOM cells and clusters. Then it constructs an N(z) estimation via a weighted sum of the empirical N(z) consisting of the normalized histogram of spec-z values contained in the same SOM cluster as each photometric galaxy. There are some general guidelines to choosing the geometry and number of total cells in the SOM. This paper: http://www.giscience2010.org/pdfs/paper_230.pdf recommends 5*sqrt(num rows * num data columns) as a rough guideline. Some authors state that a SOM with one dimension roughly twice as long as the other are better, while others find that square SOMs with equal X and Y dimensions are best, the user can set the dimensions using the n_columns and n_rows parameters. For more discussion on SOMs and photo-z calibration, see the KiDS paper on the topic: http://arxiv.org/abs/1909.09632 particularly the appendices. Note that several parameters are stored in the model file, e.g. the columns used. This ensures that the same columns used in constructing the SOM are used when finding the winning SOM cell with the test data. Two additional files are also written out: cellid_output outputs the ‘winning’ SOM cell for each photometric galaxy, in both raveled and 2D SOM cell coordinates. If the objectID or galaxy_id is present they will also be included in this file, if not the coordinates will be written in the same order in which the data is read in. uncovered_cell_file outputs the raveled cell IDs of cells that contain photometric galaxies but no corresponding spectroscopic objects, these objects should be removed from the sample as they cannot be accounted for properly in the summarizer. Some iteration on data cuts may be necessary to remove/mitigate these ‘uncovered’ objects.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
hdf5_groupname (str] (default=photometry))
redshift_col (str] (default=redshift))
spec_groupname ([str] default=photometry) – name of hdf5 group for spec data, if None, then set to ‘’
n_clusters ([int] default=-1) – The number of hierarchical clusters of SOM cells. If not provided, the SOM cells will not be clustered.
objid_name ([str] default=) – A parameter
seed ([int] default=12345) – random seed
redshift_colname ([str] default=redshift) – name of redshift column in specz file
phot_weightcol ([str] default=) – name of photometry weight, if present
spec_weightcol ([str] default=) – name of specz weight col, if present
split ([int] default=200) – the size of data chunks when calculating the distances between the codebook and data
nsamples ([int] default=20) – number of bootstrap samples to generate
useful_clusters ([list] default=[]) – the cluster indices that are used for calibration. If not given, then all the clusters containing spec sample are used.
input (TableHandle (INPUT))
spec_input (TableHandle (INPUT))
model (ModelHandle (INPUT))
output (QPHandle (OUTPUT))
single_NZ (QPHandle (OUTPUT))
cellid_output (Hdf5Handle (OUTPUT))
uncovered_cluster_file (TableHandle (OUTPUT))

__init__(args, **kwargs): Initialize Estimator that can sample galaxy data.

entrypoint_function: str | None = 'summarize'

get_som_coordinates(data, weight_col): Find the bmus coordinate of each item in the data.

interactive_function: str | None = 'somoclu_summarizer'

name = 'SOMocluSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>), ('cellid_output', <class 'rail.core.data.Hdf5Handle'>), ('uncovered_cluster_file', <class 'rail.core.data.TableHandle'>)]

replace_non_detections(data): Replace non-detected data with magnitude limits.

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

set_weight_column(data, weight_col): Assign weight vecs if present, else set all to 1.0. weight_col: column name of weights.

stage_columns: list[str] | None

class rail.stages.SZPZSummarizer

Bases: RailStage

The base class for classes that use two sets of data: a photometry sample with spec-z values, and a photometry sample with unknown redshifts, e.g. minisom_som and outputs a QP Ensemble with bootstrap realization of the N(z) distribution

__init__(args, **kwargs)

Initialize Estimator that can sample galaxy data.

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'summarize'

inputs = [('input', <class 'rail.core.data.TableHandle'>), ('spec_input', <class 'rail.core.data.TableHandle'>), ('model', <class 'rail.core.data.ModelHandle'>)]

name = 'SZPZtoNZSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>)]

stage_columns: list[str] | None

summarize(input_data, spec_data, **kwargs)

The main run method for the summarization, should be implemented in the specific subclass.

This will attach the input_data to this SZandPhottoNZSummarizer (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Estimator by using self.add_data(‘output’, output_data).

Finally, this will return a QPHandle providing access to that output data.

Parameters:

input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it
spec_data (np.ndarray) – Spectroscopic data

Returns:

Ensemble with n(z), and any ancillary data

Return type:

qp.Ensemble

class rail.stages.Selector

Bases: RailStage

Base class Selector, which makes selection to the catalog

Selector take “input” data in the form of pandas dataframes in Parquet files and provide as “output” another pandas dataframes written to Parquet files.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

inputs = [('input', <class 'rail.core.data.PqHandle'>)]

name = 'Selector'

outputs = [('output', <class 'rail.core.data.PqHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.SingleEvaluator

Bases: Evaluator

Evaluate the performance of a photo-Z estimator

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
metrics ([list] default=[]) – The metrics you want to evaluate.
exclude_metrics ([list] default=[]) – List of metrics to exclude
metric_config ([dict] default={}) – configuration of individual_metrics
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
seed ([float] default=None) – Random seed value to use for reproducible results.
force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization
point_estimates ([list] default=[]) – List of point estimates to use
truth_point_estimates ([list] default=[]) – List of true point values to use
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
input (QPOrTableHandle (INPUT))
truth (QPOrTableHandle (INPUT))
output (Hdf5Handle (OUTPUT))
summary (Hdf5Handle (OUTPUT))
single_distribution_summary (QPDictHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Evaluator

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'evaluate'

inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPOrTableHandle'>), ('truth', <class 'rail.core.data.QPOrTableHandle'>)]

interactive_function: str | None = 'single_evaluator'

metric_base_class: alias of BaseMetric

name = 'SingleEvaluator'

run()

Run method

Evaluate all the metrics and put them into a table

Notes

Get the input data from the data store under this stages ‘input’ tag Get the truth data from the data store under this stages ‘truth’ tag Puts the data into the data store under this stages ‘output’ tag

Return type:: None

stage_columns: list[str] | None

class rail.stages.SklNeurNetEstimator

Bases: CatEstimator

Subclass to implement a simple point estimate Neural Net photoz rather than actually predict PDF, for now just predict point zb and then put an error of width*(1+zb). We’ll do a “real” NN photo-z later.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
width ([float] default=0.05) – The ad hoc base width of the PDFs
ref_band (str] (default=mag_i_lsst))
nondetect_val (float] (default=99.0))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do CatEstimator specific initialization

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'skl_neur_net_estimator'

name = 'SklNeurNetEstimator'

stage_columns: list[str] | None

class rail.stages.SklNeurNetInformer

Bases: CatInformer

Subclass to train a simple point estimate Neural Net photoz rather than actually predict PDF, for now just predict point zb and then put an error of width*(1+zb). We’ll do a “real” NN photo-z later.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname (str] (default=photometry))
zmin (float] (default=0.0))
zmax (float] (default=3.0))
nzbins (int] (default=301))
nondetect_val (float] (default=99.0))
mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
ref_band (str] (default=mag_i_lsst))
redshift_col (str] (default=redshift))
width ([float] default=0.05) – The ad hoc base width of the PDFs
max_iter ([int] default=500) – max number of iterations while training the neural net. Too low a value will cause an error to be printed (though the code will still work, justnot optimally)
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do CatInformer specific initialization

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'skl_neur_net_informer'

name = 'SklNeurNetInformer'

run(): Train the NN model

stage_columns: list[str] | None

class rail.stages.SpecSelection

Bases: Selector

The super class of spectroscopic selections.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

__init__(args, **kwargs): Constructor: Do RailStage specific initialization

downsampling_N_tot(): Randomly sample down the objects to a given number of data objects.

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection'

invalid_cut(data): Removes entries in the data that have invalid magnitude values (NaN or nondetect_val).

name = 'SpecSelection'

selection(data)

Selection functions.

This should be overwritten by the subclasses corresponding to different spec selections.

stage_columns: list[str] | None

validate_colnames(data)

Validate the column names of data table to make sure they have necessary information for each selection.

Parameters:: colnames (list of str) – A list of column names

class rail.stages.SpecSelection_BOSS

Bases: SpecSelection

The class of spectroscopic selections with BOSS.

BOSS selection function is based on http://www.sdss3.org/dr9/algorithms/boss_galaxy_ts.php

The selection has changed slightly compared to Dawson+13.

BOSS covers an area of 9100 deg^2 with 893,319 galaxies.

For BOSS selection, the data should at least include gri bands.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_BOSS'

name = 'SpecSelection_BOSS'

selection(data): The BOSS selection function.

stage_columns: list[str] | None

class rail.stages.SpecSelection_DEEP2

Bases: SpecSelection

The class of spectroscopic selections with DEEP2.

DEEP2 has a sky coverage of 2.8 deg^2 with ~53000 spectra.

For DEEP2, one needs R band magnitude, B-R/R-I colors–which are not available for the time being, so we use LSST gri bands now. When the conversion degrader is ready, this subclass will be updated accordingly.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_DEEP2'

name = 'SpecSelection_DEEP2'

photometryCut(data)

Applies DEEP2 photometric cut based on Newman+13.

This modified selection gives the best match to the data n(z) with its cut at z~0.75 and the B-R/R-I distribution (Newman+13, Fig. 12).

Notes

We cannot apply the surface brightness cut and do not apply the Gaussian weighted sampling near the original colour cuts.

selection(data): DEEP2 selection function.

speczSuccess(data): Spec-z success rate as function of r_AB for Q>=3 read of Figure 13 in Newman+13 for DEEP2 fields 2-4. Values are binned in steps of 0.2 mag with the first and last bin centered on 19 and 24.

stage_columns: list[str] | None

class rail.stages.SpecSelection_DEEP2_LSST

Bases: SpecSelection

The class of spectroscopic selections with DEEP2.

Approximate Rubin->CFHT12K transforms based off of CWWSB SED colors

B = g + 0.35 * (g-r) R = r - 0.3 * (r-i) I = i - 0.5 * (r-i)

transform the cuts accordingly

Also, original has B-R < 0.5 modify to B-R < 0.33 to exclude a few more low-z galaxies leave speczSuccess unchanged from original implementation

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_DEEP2_LSST'

name = 'SpecSelection_DEEP2_LSST'

photometryCut(data)

Applies DEEP2 photometric cut based on Newman+13.

This modified selection gives the best match to the data n(z) with its cut at z~0.75 and the B-R/R-I distribution (Newman+13, Fig. 12).

Notes

We cannot apply the surface brightness cut and do not apply the Gaussian weighted sampling near the original colour cuts.

selection(data): DEEP2 selection function.

speczSuccess(data): Spec-z success rate as function of r_AB for Q>=3 read of Figure 13 in Newman+13 for DEEP2 fields 2-4. Values are binned in steps of 0.2 mag with the first and last bin centered on 19 and 24.

stage_columns: list[str] | None

class rail.stages.SpecSelection_DESI_BGS

Bases: SpecSelection

The class of spectroscopic selections with DESI BGS .

Implements a minimal DESI Bright Galaxy Survey (BGS) selection using:

r < 19.5

Required bands in data (via config.colnames): r

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_DESI_BGS'

name = 'SpecSelection_DESI_BGS'

selection(data): The DESI BGS selection function (simplified cut).

stage_columns: list[str] | None

class rail.stages.SpecSelection_DESI_ELG_LOP

Bases: SpecSelection

The class of spectroscopic selections with DESI ELG LOP.

Implements the simplified DESI ELG_LOP photometric selection using:

(g > 20) AND (gfib < 24.1)
0.15 < (r − z)
(g − r) < 0.5 × (r − z) + 0.1
(g − r) < −1.2 × (r − z) + 1.3

All of the above are combined with AND.

Required bands in data (via config.colnames): g, r, z, gfib

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_ELG_LOP'

name = 'SpecSelection_DESI_ELG_LOP'

selection(data): The DESI ELG_LOP selection function.

stage_columns: list[str] | None

class rail.stages.SpecSelection_DESI_LRG

Bases: SpecSelection

The class of spectroscopic selections with DESI LRG (simplified).

This implements a simplified DESI LRG photometric selection using:

zfiber < 21.60 (here approximated with z)
z − W1 > 0.8 × (r − z) − 0.6
(g − W1 > 2.9) OR (r − W1 > 1.8)
[ ((r − W1 > 1.8 × (W1 − 17.14)) AND (r − W1 > W1 − 16.33)) OR (r − W1 > 3.3) ]

All of the above are combined with AND.

Required bands in data (via config.colnames): g, r, z, W1

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'W1': 'W1', 'redshift': 'redshift'}) – a dictionary that includes necessary columns (magnitudes, colors and redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_DESI_LRG'

name = 'SpecSelection_DESI_LRG'

selection(data): The DESI LRG selection function (simplified).

stage_columns: list[str] | None

class rail.stages.SpecSelection_GAMA

Bases: SpecSelection

The class of spectroscopic selections with GAMA.

The GAMA survey covers an area of 286 deg^2, with ~238000 objects.

The necessary column is r band.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_GAMA'

name = 'SpecSelection_GAMA'

selection(data): GAMA selection function.

stage_columns: list[str] | None

class rail.stages.SpecSelection_HSC

Bases: SpecSelection

The class of spectroscopic selections with HSC.

For HSC, the data should at least include giz bands and redshift.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_HSC'

name = 'SpecSelection_HSC'

photometryCut(data): HSC galaxies were binned in color magnitude space with i-band mag from -2 to 6 and g-z color from 13 to 26.

selection(data)

Selection functions.

This should be overwritten by the subclasses corresponding to different spec selections.

speczSuccess(data): HSC galaxies were binned in color magnitude space with i-band mag from -2 to 6 and g-z color from 13 to 26 (200 bins in each direction). The ratio of galaxies with spectroscopic redshifts (training galaxies) to galaxies with only photometry in HSC wide field (application galaxies) was computed for each pixel. We divide the data into the same pixels and randomly select galaxies into the training sample based on the HSC ratios.

stage_columns: list[str] | None

class rail.stages.SpecSelection_VVDSf02

Bases: SpecSelection

The class of spectroscopic selections with VVDSf02.

It covers an area of 0.5 deg^2 with ~10000 sources.

Necessary columns are i band magnitude and redshift.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_VVDSf02'

name = 'SpecSelection_VVDSf02'

photometryCut(data)

Photometric cut of VVDS 2h-field based on LeFèvre+05.

Notes

The oversight of 1.0 magnitudes on the bright end misses 0.2% of galaxies.

selection(data)

Selection functions.

This should be overwritten by the subclasses corresponding to different spec selections.

speczSuccess(data)

Success rate of VVDS 2h-field.

Notes

We use a redshift-based and I-band based success rate independently here since we do not know their correlation, which makes the success rate worse than in reality.

Spec-z success rate as function of i_AB read of Figure 16 in LeFevre+05 for the VVDS 2h field. Values are binned in steps of 0.5 mag with the first starting at 17 and the last bin ending at 24.

stage_columns: list[str] | None

class rail.stages.SpecSelection_zCOSMOS

Bases: SpecSelection

The class of spectroscopic selections with zCOSMOS.

It covers an area of 1.7 deg^2 with ~20000 galaxies.

For zCOSMOS, the data should at least include i band and redshift.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
drop_rows ([bool] default=True) – Drop selected rows from output table
seed ([type not specified] default=None) – Set to an int to force reproducible results.
N_tot ([int] default=10000) – Number of selected sources
nondetect_val ([float] default=99.0) – value to be removed for non detects
downsample ([bool] default=True) – If true, downsample the selected sources into a total number of N_tot
success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/stable/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.
percentile_cut ([int] default=100) – cut redshifts above this percentile
colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

a dictionary that includes necessary columns (magnitudes, colors and
redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’
random_seed ([int] default=42) – random seed for reproducibility
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'spec_selection_zCOSMOS'

name = 'SpecSelection_zCOSMOS'

photometryCut(data)

Photometry cut for zCOSMOS based on Lilly+09.

Updates the internal state.

NOTE: This only includes zCOSMOS bright.

selection(data)

Selection functions.

This should be overwritten by the subclasses corresponding to different spec selections.

speczSuccess(data): Spec-z success rate as function of redshift (x) and I_AB (y) read of Figure 3 in Lilly+09 for zCOSMOS bright sample.

stage_columns: list[str] | None

class rail.stages.TableConverter

Bases: RailStage

Utility stage that converts tables from one format to anothe

FIXME, this is hardwired to convert parquet tables to Hdf5Tables. It would be nice to have more options here.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
output_format ([str] (required)) – Format of output table
input (PqHandle (INPUT))
output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = '__call__'

inputs = [('input', <class 'rail.core.data.PqHandle'>)]

interactive_function: str | None = 'table_converter'

name = 'TableConverter'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

class rail.stages.TrainZEstimator

Bases: CatEstimator

CatEstimator which returns a global PDF for all galaxies

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
id_col ([str] default=object_id) – name of the object ID column
redshift_col ([str] default=redshift) – name of redshift column
calc_summary_stats ([bool] default=False) – Compute summary statistics
calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.
recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates
model (ModelHandle (INPUT))
input (TableHandle (INPUT))
output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'estimate'

interactive_function: str | None = 'train_z_estimator'

name = 'TrainZEstimator'

open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:

tag – Input tag associated to the model
**kwargs (Any) – Should include ‘model’, see notes

Return type:

None

Notes

The keyword arguement ‘model’ should be either

an object with a trained model,
a path pointing to a file that can be read to obtain the trained model,
or a ModelHandle providing access to the trained model.

Returns:: The object encapsulating the trained model.
Return type:: Any
Parameters:: kwargs (Any)

stage_columns: list[str] | None

train_pdf: np.ndarray | None

zgrid: np.ndarray | None

zmode: np.ndarray | None

class rail.stages.TrainZInformer

Bases: CatInformer

Train an Estimator which returns a global PDF for all galaxies

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
redshift_col ([str] default=redshift) – name of redshift column
input (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'train_z_informer'

name = 'TrainZInformer'

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

validate()

Validation which checks if the required column names by the stage exist in the data

Return type:: None

class rail.stages.TrueNZHistogrammer

Bases: RailStage

Summarizer-like stage which simply histograms the true redshift

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
redshift_col ([str] default=redshift) – name of redshift column
selected_bin ([int] default=-1) – Which tomography bin to consider
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
input (TableHandle (INPUT))
tomography_bins (TableHandle (INPUT))
true_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:

args (Any)
kwargs (Any)

Return type:

None

bincents: ndarray | None

entrypoint_function: str | None = 'histogram'

histogram(catalog, tomo_bins, **kwargs)

The main interface method for TrueNZHistogrammer.

Creates histogram of N of Z_true.

This will attach the sample to this Stage (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Estimator by using self.add_data('output', output_data).

Finally, this will return a PqHandle providing access to that output data.

Parameters:

catalog (TableLike) – The sample with the true NZ column
tomo_bins (TableLike) – Tomographic bin assignemnets

Returns:

A handle giving access to a the histogram in QP format

Return type:

PqHandle

inputs = [('input', <class 'rail.core.data.TableHandle'>), ('tomography_bins', <class 'rail.core.data.TableHandle'>)]

interactive_function: str | None = 'true_nz_histogrammer'

name = 'TrueNZHistogrammer'

outputs = [('true_NZ', <class 'rail.core.data.QPHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

zgrid: ndarray | None

class rail.stages.UniformBinningClassifier

Bases: PZClassifier

Classifier that simply assigns tomographic bins based on a point estimate according to SRD.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
object_id_col ([str] default=) – name of object id column
point_estimate_key ([str] default=zmode) – Which point estimate to use
zbin_edges ([list] default=[]) – The tomographic redshift bin edges.If this is given (contains two or more entries), all settings below will be ignored.
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
n_tom_bins ([int] default=5) – Number of tomographic bins
no_assign ([int] default=-99) – Value for no assignment flag
input (QPHandle (INPUT))
output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'classify'

interactive_function: str | None = 'uniform_binning_classifier'

name = 'UniformBinningClassifier'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

stage_columns: list[str] | None

class rail.stages.UnrecBlModel

Bases: Degrader

Model for Creating Unrecognized Blends.

Finding objects nearby each other. Merge them into one blended Use Friends of Friends for matching. May implement shape matching in the future. Take avergaged Ra and Dec for blended source, and sum up fluxes in each band. May implement merged shapes in the future.

Requires gcc, which depending on your installation, may be difficult for the caller (FoFCatalogMatching dependency fast3tree) to find. Conda-installed gcc seems to fix this.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
seed ([int] default=12345) – Random number seed
ra_label ([str] default=ra) – ra column name
dec_label ([str] default=dec) – dec column name
linking_lengths ([float] default=1.0) – linking_lengths for FoF matching
bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))
zp_dict ([dict] default={'u': 12.65, 'g': 14.69, 'r': 14.56, 'i': 14.38, 'z': 13.99, 'y': 13.02}) – magnitude zeropoints dictionary
ref_band (str] (default=mag_i_lsst))
redshift_col (str] (default=redshift))
match_size ([bool] default=False) – consider object size for finding blends
match_shape ([bool] default=False) – consider object shape for finding blends
obj_size ([str] default=obj_size) – object size column name
a ([str] default=semi_major) – semi major axis column name
b ([str] default=semi_minor) – semi minor axis column name
theta ([str] default=orientation) – orientation angle column name
input (PqHandle (INPUT))
output (PqHandle (OUTPUT))
compInd (PqHandle (OUTPUT))

blend_info_cols = ['group_id', 'n_obj', 'brightest_flux', 'total_flux', 'z_brightest', 'z_weighted', 'z_mean', 'z_stdev']

entrypoint_function: str | None = '__call__'

interactive_function: str | None = 'unrec_bl_model'

name = 'UnrecBlModel'

outputs = [('output', <class 'rail.core.data.PqHandle'>), ('compInd', <class 'rail.core.data.PqHandle'>)]

run(): Return pandas DataFrame with blending errors.

stage_columns: list[str] | None

class rail.stages.VarInfStackInformer

Bases: PzInformer

Placeholder Informer

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
input (QPHandle (INPUT))
truth (TableHandle (INPUT))
model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'

interactive_function: str | None = 'var_inf_stack_informer'

model_handle: ModelHandle | None

name = 'VarInfStackInformer'

stage_columns: list[str] | None

class rail.stages.VarInfStackSummarizer

Bases: PZSummarizer

Variational inference summarizer based on notebook created by Markus Rau The summzarizer is appropriate for the likelihoods returned by template-based codes, for which the NaiveSummarizer are not appropriate.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
zmin ([float] default=0.0) – The minimum redshift of the z grid or sample
zmax ([float] default=3.0) – The maximum redshift of the z grid or sample
nzbins ([int] default=301) – The number of gridpoints in the z grid
seed ([int] default=87) – random seed
n_iter ([int] default=100) – The number of iterations in the variational inference
n_samples ([int] default=500) – The number of samples used in dirichlet uncertainty
input (QPHandle (INPUT))
output (QPHandle (OUTPUT))
single_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:

args (Any)
kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'summarize'

inputs = [('input', <class 'rail.core.data.QPHandle'>)]

interactive_function: str | None = 'var_inf_stack_summarizer'

name = 'VarInfStackSummarizer'

outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

summarize(input_data, **kwargs)

Summarizer for VarInfStack which returns multiple items

Parameters:: input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it
Returns:: Ensemble with n(z), and any ancillary data Return type depends on output_mode
Return type:: QPHandle | dict[str, QPHandle]

zgrid: ndarray | None

class rail.stages.YawAutoCorrelate

Bases: YawRailStage

Wrapper stage for yaw.autocorrelate to compute a sample’s angular autocorrelation amplitude.

Generally used for the reference sample to compute an estimate for its galaxy sample as a function of redshift. Data is provided as a single cache directory that must have redshifts and randoms with redshift attached.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
rmin ([float] (required)) – Single or sequence of lower scale limits in given ‘unit’.
rmax ([float] (required)) – Single or sequence of upper scale limits in given ‘unit’.
unit ([str] default=kpc) – The unit of the lower and upper scale limits.
rweight ([float] default=None) – Power-law exponent used to weight pairs by their separation.
resolution ([int] default=None) – Number of radial logarithmic bin used to approximate the weighting by separation.
zmin ([float] default=None) – Lowest redshift bin edge to generate (alternatively use ‘edges’).
zmax ([float] default=None) – Highest redshift bin edge to generate (alternatively use ‘edges’).
num_bins ([int] default=30) – Number of redshift bins to generate between ‘zmin’ and ‘zmax’.
method ([str] default=linear) – Method used to compute the spacing of bin edges.
edges ([float] default=None) – Use these custom bin edges instead of generating them.
closed ([str] default=right) – String indicating the side of the bin intervals that are closed.
max_workers ([int] default=None) – configure a custom maximum number of parallel workers to use
verbose ([str] default=info) – lowest log level emitted by yet_another_wizz
sample (YawCacheHandle (INPUT))
output (YawCorrFuncHandle (OUTPUT))

algo_parameters: set[str] = {'closed', 'edges', 'max_workers', 'method', 'num_bins', 'resolution', 'rmax', 'rmin', 'rweight', 'unit', 'zmax', 'zmin'}: Lists the names of all algorithm-specific parameters that were added when subclassing.

correlate(sample, **kwargs)

Measure the angular autocorrelation amplitude in bins of redshift.

Parameters:: sample (YawCache) – Input cache which must have randoms attached and redshifts for both data set and randoms.
Returns:: A handle for the yaw.CorrFunc instance that holds the pair counts.
Return type:: YawCorrFuncHandle

entrypoint_function: str | None = 'correlate'

inputs = [('sample', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]

interactive_function: str | None = 'yaw_auto_correlate'

name = 'YawAutoCorrelate'

outputs = [('output', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.YawCacheCreate

Bases: YawRailStage

Create a new cache directory to hold a data set and optionally its matching random catalog.

Both input data sets are split into consistent spatial patches that are required by yet_another_wizz for correlation function covariance estimates. Each patch is stored separately for efficient access.

The cache can be constructed from input files or tabular data in memory. Column names for sky coordinates are required, redshifts and per-object weights are optional. One out of three patch create methods must be specified:

Splitting the data into predefined patches (from ASCII file or an existing cache instance, linked as optional stage input).
Splitting the data based on a column with patch indices.
Generating approximately equal size patches using k-means clustering of objects positions (preferably randoms if provided).

Note: The cache directory must be deleted manually when it is no longer needed. (The reference sample cache may be reused when operating on tomographic bins.)

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
path ([str] (required)) – path to cache directory, must not exist
overwrite ([bool] default=None) – overwrite the path if it is an existing cache directory
ra_name ([str] default=ra) – column name of right ascension (in degrees)
dec_name ([str] default=dec) – column name of declination (in degrees)
weight_name ([str] default=None) – column name of weight
redshift_name ([str] default=None) – column name of redshift
degrees ([bool] default=True) – Whether the input coordinates are in degrees or radian.
patch_file ([str] default=None) – path to ASCII file that lists patch centers (one per line) as pair of R.A./Dec. in radian, separated by a single space or tab
patch_name ([str] default=None) – column name of patch index (starting from 0)
patch_num ([int] default=None) – number of spatial patches to create using knn on coordinates of randoms
probe_size ([int] default=-1) – The approximate number of objects to sample from the input file when generating patch centers.
max_workers ([int] default=None) – configure a custom maximum number of parallel workers to use
verbose ([str] default=info) – lowest log level emitted by yet_another_wizz
data (TableHandle (INPUT))
rand (TableHandle (INPUT))
patch_source (YawCacheHandle (INPUT))
output (YawCacheHandle (OUTPUT))

algo_parameters: set[str] = {'dec_name', 'degrees', 'max_workers', 'overwrite', 'patch_file', 'patch_name', 'patch_num', 'path', 'probe_size', 'ra_name', 'redshift_name', 'weight_name'}: Lists the names of all algorithm-specific parameters that were added when subclassing.

create(data, rand=None, patch_source=None, **kwargs)

Create the new cache directory and split the input data into spatial patches.

Parameters:

data (DataFrame) – The data set to split into patches and cache.
rand (DataFrame, optional) – The randoms to split into patches and cache, positions used to automatically generate patch centers if provided and stage is configured with patch_num. For interactive mode RAIL, set to the string “none” if not desired.
patch_source (YawCache, optional) – An existing cache instance that provides the patch centers. Use to ensure consistent patch centers when running cross-correlations. Takes precedence over the any configuration parameters. For interactive mode RAIL, set to the string “none” if not desired.

Returns:

A handle for the newly created cache directory.

Return type:

YawCacheHandle

entrypoint_function: str | None = 'create'

inputs = [('data', <class 'rail.core.data.TableHandle'>), ('rand', <class 'rail.core.data.TableHandle'>), ('patch_source', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]

interactive_function: str | None = 'yaw_cache_create'

name = 'YawCacheCreate'

outputs = [('output', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.YawCrossCorrelate

Bases: YawRailStage

Wrapper stage for yaw.crosscorrelate to compute the angular cross- correlation amplitude between the reference and the unknown sample.

Generally used for the reference sample to compute an estimate for its galaxy sample as a function of redshift. Data sets are provided as cache directories. The reference sample must have redshifts and at least one cache must have randoms attached.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
rmin ([float] (required)) – Single or sequence of lower scale limits in given ‘unit’.
rmax ([float] (required)) – Single or sequence of upper scale limits in given ‘unit’.
unit ([str] default=kpc) – The unit of the lower and upper scale limits.
rweight ([float] default=None) – Power-law exponent used to weight pairs by their separation.
resolution ([int] default=None) – Number of radial logarithmic bin used to approximate the weighting by separation.
zmin ([float] default=None) – Lowest redshift bin edge to generate (alternatively use ‘edges’).
zmax ([float] default=None) – Highest redshift bin edge to generate (alternatively use ‘edges’).
num_bins ([int] default=30) – Number of redshift bins to generate between ‘zmin’ and ‘zmax’.
method ([str] default=linear) – Method used to compute the spacing of bin edges.
edges ([float] default=None) – Use these custom bin edges instead of generating them.
closed ([str] default=right) – String indicating the side of the bin intervals that are closed.
max_workers ([int] default=None) – configure a custom maximum number of parallel workers to use
verbose ([str] default=info) – lowest log level emitted by yet_another_wizz
reference (YawCacheHandle (INPUT))
unknown (YawCacheHandle (INPUT))
output (YawCorrFuncHandle (OUTPUT))

algo_parameters: set[str] = {'closed', 'edges', 'max_workers', 'method', 'num_bins', 'resolution', 'rmax', 'rmin', 'rweight', 'unit', 'zmax', 'zmin'}: Lists the names of all algorithm-specific parameters that were added when subclassing.

correlate(reference, unknown, **kwargs)

Measure the angular cross-correlation amplitude in bins of redshift.

Parameters:

reference (YawCache) – Cache for the reference data, must have redshifts. If no randoms are attached, the unknown data cache must provide them.
unknown (YawCache) – Cache for the unknown data. If no randoms are attached, the reference data cache must provide them.

Returns:

A handle for the yaw.CorrFunc instance that holds the pair counts.

Return type:

YawCorrFuncHandle

entrypoint_function: str | None = 'correlate'

inputs = [('reference', <class 'rail.yaw_rail.handles.YawCacheHandle'>), ('unknown', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]

interactive_function: str | None = 'yaw_cross_correlate'

name = 'YawCrossCorrelate'

outputs = [('output', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

class rail.stages.YawSummarize

Bases: YawRailStage

A summarizer that computes a clustering redshift estimate from the measured correlation amplitudes.

Evaluates the cross-correlation pair counts with the provided estimator. Additionally corrects for galaxy sample bias if autocorrelation measurements are provided as stage inputs.

Note: This summarizer does not produce a PDF, but a ratio of correlation functions, which may result in negative values. Further modelling of the output is required.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
verbose ([str] default=info) – lowest log level emitted by yet_another_wizz
cross_corr (YawCorrFuncHandle (INPUT))
auto_corr_ref (YawCorrFuncHandle (INPUT))
auto_corr_unk (YawCorrFuncHandle (INPUT))
output (ModelHandle (OUTPUT))

algo_parameters: set[str] = {}: Lists the names of all algorithm-specific parameters that were added when subclassing.

entrypoint_function: str | None = 'summarize'

inputs = [('cross_corr', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>), ('auto_corr_ref', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>), ('auto_corr_unk', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>)]

interactive_function: str | None = 'yaw_summarize'

name = 'YawSummarize'

outputs = [('output', <class 'rail.core.data.ModelHandle'>)]

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:: None

stage_columns: list[str] | None

summarize(cross_corr, auto_corr_ref=None, auto_corr_unk=None, **kwargs)

Compute a clustring redshift estimate and convert it to a PDF.

Parameters:

cross_corr (CorrFunc) – Pair counts from the cross-correlation measurement, basis for the clustering redshift estimate.
auto_corr_ref (CorrFunc, optional) – Pair counts from the reference sample autocorrelation measurement, used to correct for the reference sample galaxy bias.
auto_corr_unk (CorrFunc, optional) – Pair counts from the unknown sample autocorrelation measurement, used to correct for the reference sample galaxy bias. Typically only availble when using simulated data sets. For interactive mode RAIL, set to the string “none” if not desired.

Returns:

The clustering redshift estimate, spatial (jackknife) samples thereof, and its covariance matrix.

Return type:

YawRedshiftDataHandle