rail.stages package

class rail.stages.AddColumnOfRandom

Bases: Noisifier

Add a column of random numbers to a dataframe

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • col_name ([str] default=chaos_bunny) – Name of the column with random numbers

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor

Does standard Noisifier initialization

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'add_column_of_random'
name = 'AddColumnOfRandom'
stage_columns: list[str] | None
class rail.stages.BPZliteEstimator

Bases: CatEstimator

CatEstimator subclass to implement basic marginalized PDF for BPZ In addition to the marginalized redshift PDF, we also compute several ancillary quantities that will be stored in the ensemble ancil data: zmode: mode of the PDF amean: mean of the PDF tb: integer specifying the best-fit SED at the redshift mode todds: fraction of marginalized posterior prob. of best template, so lower numbers mean other templates could be better fits, likely at other redshifts

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col (str] (default=redshift))

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • dz ([float] default=0.01) – delta z in grid

  • unobserved_val ([float] default=-99.0) – value to be replaced with zero flux and given large errors for non-observed filters

  • bpz_ref_data_path ([str] default=None) – bpz_ref_data_path (str): file path to the SED, FILTER, and AB directories. If left to default None it will use the install directory for rail + ../examples_data/estimation_data/data

  • filter_list (list] (default=['DC2LSST_u', 'DC2LSST_g', 'DC2LSST_r', 'DC2LSST_i', 'DC2LSST_z', 'DC2LSST_y']))

  • spectra_file ([str] default=CWWSB4.list) – name of the file specifying the list of SEDs to use

  • madau_flag ([str] default=no) – set to ‘yes’ or ‘no’ to set whether to include intergalactic Madau reddening when constructing model fluxes

  • no_prior ([bool] default=False) – set to True if you want to run with no prior

  • p_min ([float] default=0.005) – BPZ sets all values of the PDF that are below p_min*peak_value to 0.0, p_min controls that fractional cutoff

  • gauss_kernel ([float] default=0.0) – gauss_kernel (float): BPZ convolves the PDF with a kernel if this is set to a non-zero number

  • zp_errors (list] (default=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1]))

  • mag_err_min ([float] default=0.005) – a minimum floor for the magnitude errors to prevent a large chi^2 for very very bright objects

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor, build the CatEstimator, then do BPZ specific setup

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'bpz_lite_estimator'
name = 'BPZliteEstimator'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

stage_columns: list[str] | None
class rail.stages.BPZliteInformer

Bases: CatInformer

Inform stage for BPZliteEstimator, this stage assumes that you have a set of SED templates and that the training data has already been assigned a ‘best fit broad type’ (that is, something like ellliptical, spiral, irregular, or starburst, similar to how the six SEDs in the CWW/SB set of Benitez (2000) are assigned 3 broad types). This informer will then fit parameters for the evolving type fraction as a function of apparent magnitude in a reference band, P(T|m), as well as the redshift prior of finding a galaxy of the broad type at a particular redshift, p(z|m, T) where z is redshift, m is apparent magnitude in the reference band, and T is the ‘broad type’. We will use the same forms for these functions as parameterized in Benitez (2000). For p(T|m) we have p(T|m) = exp(-kt(m-m0)) where m0 is a constant and we fit for values of kt For p(z|T,m) we have

` P(z|T,m) = f_x*z0_x^a *exp(-(z/zm_x)^a) where zm_x = z0_x*(km_x-m0) `

where f_x is the type fraction from p(T|m), and we fit for values of z0, km, and a for each type. These parameters are then fed to the BPZ prior for use in the estimation stage.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • redshift_col (str] (default=redshift))

  • bpz_ref_data_path ([str] default=None) – bpz_ref_data_path (str): file path to the SED, FILTER, and AB directories. If left to default None it will use the install directory for rail + rail/examples_data/estimation_data/data

  • spectra_file ([str] default=CWWSB4.list) – name of the file specifying the list of SEDs to use

  • m0 ([float] default=20.0) – reference apparent mag, used in prior param

  • nt_array ([list] default=[1, 2, 5]) – list of integer number of templates per ‘broad type’, must be in same order as the template set, and must sum to the same number as the # of templates in the spectra file

  • mmin ([float] default=18.0) – lowest apparent mag in ref band, lower values ignored

  • mmax ([float] default=29.0) – highest apparent mag in ref band, higher values ignored

  • init_kt ([float] default=0.3) – initial guess for kt in training

  • init_zo ([float] default=0.4) – initial guess for z0 in training

  • init_alpha ([float] default=1.8) – initial guess for alpha in training

  • init_km ([float] default=0.1) – initial guess for km in training

  • type_file ([str] default=) – name of file with the broad type fits for the training data

  • output_hdfn ([bool] default=True) – if True, just return the default HDFN prior params rather than fitting

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Init function, init config stuff

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'bpz_lite_informer'
name = 'BPZliteInformer'
run()

compute the best fit prior parameters

stage_columns: list[str] | None
class rail.stages.CMNNEstimator

Bases: CatEstimator

Color Matched Nearest Neighbor Estimator Note that there are several modifications from the original CMNN, mainly that the original estimator dropped non-detections from the Mahalnobis distance calculation. However, there is information in a non-detection, so instead here I’ve replaced the non-detections with 1 sigma limit and a magnitude uncertainty of 1.0 and fixed the degrees of freedom to be the number of magnitude bands minus one.

Current implementation returns a single Gaussian for each galaxy with a width determined by the std deviation of all galaxies within the range set by the ppf value.

There are three options for how to choose the central value of the Gaussian and that option is set using the selection_mode config parameter (integer): option 0: randomly choose one of the neighbors within the PPF cutoff option 1: choose the value with the smallest Mahalnobis distance option 2: random choice as in option 0, but weighted by distance

If a test galaxy does not have enough training galaxies it is assigned a redshift bad_redshift_val and a width bad_redshift_err, both of which are config parameters that can be set by the user. Note that this should only happen if the number of training galaxies is smaller than min_n, which is unlikely, but is included here for completeness.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col (str] (default=redshift))

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • seed ([int] default=66) – random seed used in selection mode

  • ppf_value ([float] default=0.68) – PPF value used in Mahalanobis distance

  • selection_mode ([int] default=1) – select which mode to choose the redshift estimate:0: randomly choose, 1: nearest neigh, 2: weighted random

  • min_n ([int] default=25) – minimum number of training galaxies to use

  • min_thresh ([float] default=0.0001) – minimum threshold cutoff

  • min_dist ([float] default=0.0001) – minimum Mahalanobis distance

  • bad_redshift_val ([float] default=99.0) – redshift to assign bad redshifts

  • bad_redshift_err ([float] default=10.0) – Gauss error width to assign to bad redshifts

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do Estimator specific initialization

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'cmnn_estimator'
name = 'CMNNEstimator'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

stage_columns: list[str] | None
class rail.stages.CMNNInformer

Bases: CatInformer

compute colors and color errors for CMNN training set and store in a model file that will be used by the CMNNEstimator stage

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • redshift_col (str] (default=redshift))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • nondetect_val (float] (default=99.0))

  • nondetect_replace ([bool] default=False) – set to True to replace non-detects, False to ignore in distance calculation

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor Do CatInformer specific initialization, then check on bands

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'cmnn_informer'
name = 'CMNNInformer'
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None
class rail.stages.CatClassifier

Bases: RailStage

The base class for assigning classes to catalogue-like table.

Classifier uses a generic “model”, the details of which depends on the sub-class.

CatClassifier take as “input” a catalogue-like table, assign each object into a tomographic bin, and provide as “output” a tabular data which can be appended to the catalogue.

__init__(args, **kwargs)

Initialize Classifier

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

classify(input_data, **kwargs)

The main run method for the classifier, should be implemented in the specific subclass.

This will attach the input_data to this CatClassifier (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).

Finally, this will return a TableHandle providing access to that output data.

Parameters:

input_data (TableLike) – A dictionary of all input data

Returns:

Class assignment for each galaxy.

Return type:

TableHandle

entrypoint_function: str | None = 'classify'
inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]
model: ModelLike | None
name = 'CatClassifier'
outputs = [('output', <class 'rail.core.data.TableHandle'>)]
stage_columns: list[str] | None
class rail.stages.CatEstimator

Bases: RailStage, PointEstimationMixin

The base class for making photo-z posterior estimates from catalog-like inputs (i.e., tables with fluxes in photometric bands among the set of columns)

Estimators use a generic “model”, the details of which depends on the sub-class.

Estimators take as “input” tabular data, apply the photo-z estimation and provide as “output” a QPEnsemble, with per-object p(z).

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col ([str] default=redshift) – name of redshift column

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

classmethod default_distribution_type()

Return the type of distribution that this estimator creates

By default this is DistributionType.ad_hoc But this can be overridden by sub-classes to return DistributionType.posterior or DistributionType.likelihood if appropriate

Return type:

DistributionType

entrypoint_function: str | None = 'estimate'
estimate(input_data, **kwargs)

The main interface method for the photo-z estimation

This will attach the input data (defined in inputs as “input”) to this Estimator (for introspection and provenance tracking). Then call the run(), validate(), and finalize() methods.

The run method will call _process_chunk(), which needs to be implemented in the subclass, to process input data in batches. See RandomGaussEstimator for a simple example.

Finally, this will return a QPHandle for access to that output data.

Parameters:

input_data (TableLike) – A dictionary of all input data

Returns:

Handle providing access to QP ensemble with output data

Return type:

QPHandle

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]
name = 'CatEstimator'
outputs = [('output', <class 'rail.core.data.QPHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.CatSummarizer

Bases: RailStage

The base class for classes that go from catalog-like tables to ensemble NZ estimates.

CatSummarizer take as “input” a catalog-like table. I.e., a table with fluxes in photometric bands among the set of columns.

provide as “output” a QPEnsemble, with per-ensemble n(z).

entrypoint_function: str | None = 'summarize'
inputs = [('input', <class 'rail.core.data.TableHandle'>)]
name = 'CatSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>)]
stage_columns: list[str] | None
summarize(input_data)

The main run method for the summarization, should be implemented in the specific subclass.

This will attach the input_data to this CatSummarizer (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this CatSummarizer by using self.add_data(‘output’, output_data).

Finally, this will return a QPHandle providing access to that output data.

Parameters:

input_data (TableLike) – Either a dictionary of all input data or a TableHandle providing access to the same

Returns:

Ensemble with n(z), and any ancillary data

Return type:

QPHandle

class rail.stages.ColumnMapper

Bases: RailStage

Utility stage that remaps the names of columns.

  1. This operates on pandas dataframs in parquet files.

2. In short, this does: output_data = input_data.rename(columns=self.config.columns, in_place=self.config.in_place)

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • columns ([dict] (required)) – Map of columns to rename

  • in_place ([bool] default=False) – Update file in place

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'column_mapper'
name = 'ColumnMapper'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

class rail.stages.Creator

Bases: RailStage

Base class for Creators that generate synthetic photometric data from a model.

Creator will output a table of photometric data. The details will depend on the particular engine.

__init__(args, **kwargs)

Initialize Creator

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'sample'
inputs = [('model', <class 'rail.core.data.ModelHandle'>)]
name = 'Creator'
outputs = [('output', <class 'rail.core.data.TableHandle'>)]
sample(n_samples=None, seed=None, **kwargs)

Draw samples from the model specified in the configuration.

This is a method for running a Creator in interactive mode. In pipeline mode, the subclass run method will be called by itself.

Parameters:
  • n_samples (int, optional) – The number of samples to draw, by default None

  • seed (int, optional) – The random seed to control sampling, by default None

  • **kwargs (Any) – Used to update the configuration

Returns:

TableHandle wrapping the newly created samples

Return type:

TableHandle

Notes

This method puts n_samples and seed into the stage configuration data, which makes them available to other methods.

It then calls the run method, which must be defined by a subclass.

Finally, the TableHandle associated to the output tag is returned.

stage_columns: list[str] | None
class rail.stages.DNFEstimator

Bases: CatEstimator

A class for estimating photometric redshifts using the DNF method.

This class extends CatEstimator and predicts redshifts based on photometric. It supports multiple selection modes for redshift estimation, processes missing data, and generates probability density functions (PDFs) for photometric redshifts.

Metrics (selection_mode): - ENF (1): Euclidean neighbourhood. It’s a common distance metric used in kNN (k-Nearest Neighbors) for photometric redshift prediction. - ANF (2): uses normalized inner product for more accurate photo-z predictions. It is particularly recommended when working with datasets containing more than four filters. - DNF (3): combines Euclidean and angular metrics, improving accuracy, especially for larger neighborhoods, and maintaining proportionality in observable content.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col (str] (default=redshift))

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • selection_mode ([int] default=1) – select which mode to choose the redshift estimate:0: ENF, 1: ANF, 2: DNF

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do Estimator specific initialization

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'dnf_estimator'
name = 'DNFEstimator'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

stage_columns: list[str] | None
class rail.stages.DNFInformer

Bases: CatInformer

A class for photometric redshift estimation.

This class extends CatInformer and processes photometric data to train for estimating redshifts. It handles missing data by replacing non-detections with predefined magnitude limits and assigns errors accordingly.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname (str] (default=photometry))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • redshift_col (str] (default=redshift))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • nondetect_val (float] (default=99.0))

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor Do CatInformer specific initialization, then check on bands

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'dnf_informer'
name = 'DNFInformer'
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None
class rail.stages.DSPSPhotometryCreator

Bases: Creator

Derived class of Creator that generate synthetic absolute and apparent magnitudes from one or more SED models generated with the DSPSSingleSedModeler or DSPSPopulationSedModeler classes. It accepts as input Hdf5Handles containing the rest-frame SEDs in units of Lsun/Hz and outputs an Hdf5Handle containing sequential indices, absolute and apparent magnitudes for each galaxy. Photometric quantities are computed for the filters defined in the configuration file.

jax serially execute the computations on CPU on single core, for CPU parallelization you need MPI. If GPU is used, jax natively and automatically parallelize the execution.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • redshift_key (str] (default=redshifts))

  • restframe_sed_key ([str] default=restframe_seds) – Rest-frame SED keyword name of the hdf5 dataset containing rest-frame SEDs

  • absolute_mags_key ([str] default=rest_frame_absolute_mags) – Absolute magnitudes keyword name of the output hdf5 dataset

  • apparent_mags_key ([str] default=apparent_mags) – Apparent magnitudes keyword name of the output hdf5 dataset

  • filter_folder ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/filters) – Folder containing filter transmissions

  • instrument_name ([str] default=lsst) – Instrument name as prefix to filter transmission files

  • wavebands ([list] default=['u', 'g', 'r', 'i', 'z', 'y']) – List of wavebands

  • min_wavelength (float] (default=250.0))

  • max_wavelength (float] (default=12000.0))

  • ssp_templates_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/ssp_data_fsps_v3.2_lgmet_age.h5) – hdf5 file storing the SSP libraries used to create SEDs

  • default_cosmology ([bool] default=True) – True to use default DSPS cosmology. If False,Om0, w0, wa, h need to be supplied in the sample function

  • model (Hdf5Handle (INPUT))

  • output (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

Initialize DSPSPhotometryCreator class. If the SSP templates are not provided by the user, they are automatically downloaded from the public NERSC directory. These default templates are created with default FSPS values, with gas emission at fixed gas solar metallicity value. The _b and _c tuples for jax are composed of None or 0, depending on whether you don’t or do want the array axis to map over for all arguments.

Parameters:
  • args

  • comm

default_files_folder = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data'
entrypoint_function: str | None = 'sample'
inputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]
interactive_function: str | None = 'dsps_photometry_creator'
name = 'DSPSPhotometryCreator'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

This function computes rest-frame absolute magnitudes in the provided wavebands for all the galaxies in the population by calling _calc_rest_mag_vmap from DSPS. It does the same for the observed magnitudes in the AB system by calling _calc_obs_mag_vmap from DSPS. It then stores both kind of magnitudes and the galaxy indices into an Hdf5Handle.

sample(model, seed=None, Om0=0.3075, w0=-1.0, wa=0.0, h=0.6774, **kwargs)

Creates observed and absolute magnitudes for the population of galaxy rest-frame SEDs and stores them into an Hdf5Handle.

Parameters:
  • model (str) – Filepath to the hdf5 table containing the galaxy rest-frame SEDs.

  • seed (int) – The random seed to control sampling

  • Om0 (float) – Omega matter: density of non-relativistic matter in units of the critical density at z=0.

  • w0 (float) – Dark energy equation of state at z=0 (a=1). This is pressure/density for dark energy in units where c=1.

  • wa (float) – Negative derivative of the dark energy equation of state with respect to the scale factor. A cosmological constant has w0=-1.0 and wa=0.0.

  • h (float) – dimensionless Hubble constant at z=0.

Returns:

Hdf5Handle storing the absolute and apparent magnitudes.

Return type:

Hdf5Handle

Notes

This method puts seed into the stage configuration data, which makes them available to other methods. It then calls the run method. Finally, the Hdf5Handle associated to the output tag is returned.

stage_columns: list[str] | None
class rail.stages.DSPSPopulationSedModeler

Bases: Modeler

Derived class of Modeler for creating a population of galaxy rest-frame SED models using DSPS v3. (Hearin+21). SPS calculations are based on a set of template SEDs of simple stellar populations (SSPs). Supplying such templates is outside the planned scope of the DSPS package, and so they will need to be retrieved from some other library. For example, the FSPS library supplies such templates in a convenient form.

The input galaxy properties, such as star-formation histories and metallicities, need to be supplied via an hdf5 table.

The user-provided metallicity grid should be consistently defined with the metallicity of the templates SEDs. Users should be cautious in the use of the cosmic time grid. The time resolution strongly depends on the user scientific aim. jax serially execute the computations on CPU on single core, for CPU parallelization you need MPI. If GPU is used, jax natively and automatically parallelize the execution.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • ssp_templates_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/ssp_data_fsps_v3.2_lgmet_age.h5) – hdf5 file storing the SSP libraries used to create SEDs

  • redshift_key (str] (default=redshifts))

  • cosmic_time_grid_key ([str] default=cosmic_time_grid) – Cosmic time grid keyword name of the hdf5 dataset, this is the grid of Universe age over which the stellar mass build-up takes place in units of Gyr

  • star_formation_history_key ([str] default=star_formation_history) – Star-formation history keyword name of the hdf5 dataset, this is the star-formation history of the galaxy in units of Msun/yr

  • stellar_metallicity_key ([str] default=stellar_metallicity) – Stellar metallicity keyword name of the hdf5 dataset, this is the stellar metallicity in units of log10(Z)

  • stellar_metallicity_scatter_key ([str] default=stellar_metallicity_scatter) – Stellar metallicity scatter keyword name of the hdf5 dataset, this is lognormal scatter in the metallicity distribution function

  • restframe_sed_key ([str] default=restframe_seds) – Rest-frame SED keyword name of the output hdf5 dataset

  • default_cosmology ([bool] default=True) – True to use default DSPS cosmology. If False,Om0, w0, wa, h need to be supplied in the fit_model function

  • min_wavelength (float] (default=250.0))

  • max_wavelength (float] (default=12000.0))

  • input (Hdf5Handle (INPUT))

  • model (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

Initialize SedModeler class. If the SSP templates are not provided by the user, they are automatically downloaded from the public NERSC directory. These default templates are created with default FSPS values, with gas emission at fixed gas solar metallicity value. The _a tuple for jax is composed of None or 0, depending on whether you don’t or do want the array axis to map over for all arguments.

default_files_folder = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data'
entrypoint_function: str | None = 'fit_model'
fit_model(input_data='/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/input_galaxy_properties_dsps.hdf5', Om0=0.3075, w0=-1.0, wa=0.0, h=0.6774, **kwargs)

This function generates the rest-frame SEDs and stores them into the Hdf5Handle.

Parameters:
  • input_data (str) – Filepath to the hdf5 table containing galaxy properties.

  • Om0 (float) – Omega matter: density of non-relativistic matter in units of the critical density at z=0.

  • w0 (float) – Dark energy equation of state at z=0 (a=1). This is pressure/density for dark energy in units where c=1.

  • wa (float) – Negative derivative of the dark energy equation of state with respect to the scale factor. A cosmological constant has w0=-1.0 and wa=0.0.

  • h (float) – dimensionless Hubble constant at z=0.

Returns:

Hdf5 table storing the rest-frame SED model

Return type:

Hdf5Handle

inputs = [('input', <class 'rail.core.data.Hdf5Handle'>)]
interactive_function: str | None = 'dsps_population_sed_modeler'
name = 'DSPSPopulationSedModeler'
outputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]
run()

Run method. It Calls _get_rest_frame_seds from DSPS to create rest-frame SEDs for a population of galaxies. The load_ssp_templates function loads the SSP templates created with FSPS. The resulting NamedTuple has 4 entries:

ssp_lgmetndarray of shape (n_met, )
    Array of log10(Z) of the SSP templates where dimensionless Z is the mass fraction of elements heavier than He
ssp_lg_age_gyrndarray of shape (n_ages, )
    Array of log10(age/Gyr) of the SSP templates
ssp_wave : ndarray of shape (n_wave, )
ssp_fluxndarray of shape (n_met, n_ages, n_wave)
    SED of the SSP in units of Lsun/Hz/Msun

Notes

The initial stellar mass of the galaxy is 0. The definition of the stellar mass table as cumulative sum refers to the total stellar mass formed. DSPS conveniently provides IMF-dependent fitting functions to compute the surviving mass (see surviving_mstar.py). The units of the resulting rest-frame SED is solar luminosity per Hertz. The luminosity refers to that emitted by the formed mass at the time of observation.

stage_columns: list[str] | None
class rail.stages.DSPSSingleSedModeler

Bases: Modeler

Derived class of Modeler for creating a single galaxy rest-frame SED model using DSPS v3. (Hearin+21). SPS calculations are based on a set of template SEDs of simple stellar populations (SSPs). Supplying such templates is outside the planned scope of the DSPS package, and so they will need to be retrieved from some other library. For example, the FSPS library supplies such templates in a convenient form.

The input galaxy properties, such as star-formation histories and metallicities, need to be supplied via an hdf5 table.

The user-provided metallicity grid should be consistently defined with the metallicity of the templates SEDs. Users should be cautious in the use of the cosmic time grid. The time resolution strongly depends on the user scientific aim.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • ssp_templates_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/ssp_data_fsps_v3.2_lgmet_age.h5) – hdf5 file storing the SSP libraries used to create SEDs

  • redshift_key (str] (default=redshifts))

  • cosmic_time_grid_key ([str] default=cosmic_time_grid) – Cosmic time grid keyword name of the hdf5 dataset, this is the grid of Universe age over which the stellar mass build-up takes place in units of Gyr

  • star_formation_history_key ([str] default=star_formation_history) – Star-formation history keyword name of the hdf5 dataset, this is the star-formation history of the galaxy in units of Msun/yr

  • stellar_metallicity_key ([str] default=stellar_metallicity) – Stellar metallicity keyword name of the hdf5 dataset, this is the stellar metallicity in units of log10(Z)

  • stellar_metallicity_scatter_key ([str] default=stellar_metallicity_scatter) – Stellar metallicity scatter keyword name of the hdf5 dataset, this is lognormal scatter in the metallicity distribution function

  • restframe_sed_key ([str] default=restframe_sed) – Rest-frame SED keyword name of the output hdf5 dataset

  • default_cosmology ([bool] default=True) – True to use default DSPS cosmology. If False,Om0, w0, wa, h need to be supplied in the fit_model function

  • min_wavelength (float] (default=250.0))

  • max_wavelength (float] (default=12000.0))

  • input (Hdf5Handle (INPUT))

  • model (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

Initialize SedModeler class. If the SSP templates are not provided by the user, they are automatically downloaded from the public NERSC directory. These default templates are created with default FSPS values, with gas emission at fixed gas solar metallicity value.

Parameters:
  • args

  • comm

default_files_folder = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data'
entrypoint_function: str | None = 'fit_model'
fit_model(input_data='/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/dsps_default_data/input_galaxy_properties_dsps.hdf5', Om0=0.3075, w0=-1.0, wa=0.0, h=0.6774, **kwargs)

This function generates the rest-frame SEDs and stores them into the Hdf5Handle.

Parameters:
  • input_data (str) – Filepath to the hdf5 table containing galaxy properties.

  • Om0 (float) – Omega matter: density of non-relativistic matter in units of the critical density at z=0.

  • w0 (float) – Dark energy equation of state at z=0 (a=1). This is pressure/density for dark energy in units where c=1.

  • wa (float) – Negative derivative of the dark energy equation of state with respect to the scale factor. A cosmological constant has w0=-1.0 and wa=0.0.

  • h (float) – dimensionless Hubble constant at z=0.

Returns:

Hdf5 table storing the rest-frame SED model

Return type:

Hdf5Handle

inputs = [('input', <class 'rail.core.data.Hdf5Handle'>)]
interactive_function: str | None = 'dsps_single_sed_modeler'
name = 'DSPSSingleSedModeler'
outputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]
run()

Run method. It Calls _get_rest_frame_seds from DSPS to create a galaxy rest-frame SED. The load_ssp_templates function loads the SSP templates created with FSPS. The resulting NamedTuple has 4 entries:

ssp_lgmetndarray of shape (n_met, )
    Array of log10(Z) of the SSP templates where dimensionless Z is the mass fraction of elements heavier than He
ssp_lg_age_gyrndarray of shape (n_ages, )
    Array of log10(age/Gyr) of the SSP templates
ssp_wave : ndarray of shape (n_wave, )
ssp_fluxndarray of shape (n_met, n_ages, n_wave)
    SED of the SSP in units of Lsun/Hz/Msun

Notes

The initial stellar mass of the galaxy is 0. The definition of the stellar mass table as cumulative sum refers to the total stellar mass formed. DSPS conveniently provides IMF-dependent fitting functions to compute the surviving mass (see surviving_mstar.py). The units of the resulting rest-frame SED is solar luminosity per Hertz. The luminosity refers to that emitted by the formed mass at the time of observation.

stage_columns: list[str] | None
class rail.stages.Degrader

Bases: RailStage

Base class Degraders, which apply various degradations to synthetic photometric data.

Degraders take “input” data in the form of pandas dataframes in Parquet files and provide as “output” another pandas dataframes written to Parquet files.

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
name = 'Degrader'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
stage_columns: list[str] | None
class rail.stages.Dereddener

Bases: DustMapBase

Utility stage that does dereddening

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • ra_name ([str] default=ra) – Name of the RA column

  • dec_name ([str] default=dec) – Name of the DEC column

  • mag_name ([str] default=mag_{band}_lsst) – Template for the magnitude columns

  • band_a_env (dict] (default={'mag_u_lsst': 4.81, 'mag_g_lsst': 3.64, 'mag_r_lsst': 2.7, 'mag_i_lsst': 2.06, 'mag_z_lsst': 1.58, 'mag_y_lsst': 1.31}))

  • dustmap_name ([str] default=sfd) – Name of the dustmap in question

  • dustmap_dir ([str] (required)) – Directory with dustmaps

  • copy_cols ([list] default=[]) – Additional columns to copy

  • copy_all_cols ([bool] default=False) – Copy all the columns

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'dereddener'
name = 'Dereddener'
class rail.stages.DistToDistEvaluator

Bases: Evaluator

Evaluate the performance of a photo-z estimator against reference PDFs

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • metrics ([list] default=[]) – The metrics you want to evaluate.

  • exclude_metrics ([list] default=[]) – List of metrics to exclude

  • metric_config ([dict] default={}) – configuration of individual_metrics

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • seed ([float] default=None) – Random seed value to use for reproducible results.

  • force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization

  • metric_integration_limits ([list] default=[0.0, 3.0]) – The default end points for calculating metrics on a grid.

  • dx ([float] default=0.01) – The default step size when calculating metrics on a grid.

  • n_samples ([int] default=100) – The number of random samples to select for certain metrics.

  • input (QPHandle (INPUT))

  • truth (QPHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

  • summary (Hdf5Handle (OUTPUT))

  • single_distribution_summary (QPDictHandle (OUTPUT))

entrypoint_function: str | None = 'evaluate'
inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.QPHandle'>)]
interactive_function: str | None = 'dist_to_dist_evaluator'
metric_base_class

alias of DistToDistMetric

name = 'DistToDistEvaluator'
stage_columns: list[str] | None
class rail.stages.DistToPointEvaluator

Bases: Evaluator

Evaluate the performance of a photo-z estimator against reference point estimate

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • metrics ([list] default=[]) – The metrics you want to evaluate.

  • exclude_metrics ([list] default=[]) – List of metrics to exclude

  • metric_config ([dict] default={}) – configuration of individual_metrics

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • seed ([float] default=None) – Random seed value to use for reproducible results.

  • force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization

  • metric_integration_limits ([list] default=[0.0, 3.0]) – The default end points for calculating metrics on a grid.

  • dx ([float] default=0.01) – The default step size when calculating metrics on a grid.

  • quantile_grid ([list] (default=[...])) – The quantile value grid on which to evaluate the CDF values. (0, 1)

  • x_grid ([list] (default=[...])) – The x-value grid at which to evaluate the pdf values.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • reference_dictionary_key ([str] default=redshift) – The key in the truth dictionary where the redshift data is stored.

  • input (QPHandle (INPUT))

  • truth (TableHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

  • summary (Hdf5Handle (OUTPUT))

  • single_distribution_summary (QPDictHandle (OUTPUT))

entrypoint_function: str | None = 'evaluate'
inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'dist_to_point_evaluator'
metric_base_class

alias of DistToPointMetric

name = 'DistToPointEvaluator'
stage_columns: list[str] | None
class rail.stages.DustMapBase

Bases: RailStage

Utility stage that does dereddening

Note: set copy_all_cols=True to copy all columns in data, copy_cols will be ignored

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • ra_name ([str] default=ra) – Name of the RA column

  • dec_name ([str] default=dec) – Name of the DEC column

  • mag_name ([str] default=mag_{band}_lsst) – Template for the magnitude columns

  • band_a_env (dict] (default={'mag_u_lsst': 4.81, 'mag_g_lsst': 3.64, 'mag_r_lsst': 2.7, 'mag_i_lsst': 2.06, 'mag_z_lsst': 1.58, 'mag_y_lsst': 1.31}))

  • dustmap_name ([str] default=sfd) – Name of the dustmap in question

  • dustmap_dir ([str] (required)) – Directory with dustmaps

  • copy_cols ([list] default=[]) – Additional columns to copy

  • copy_all_cols ([bool] default=False) – Copy all the columns

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
fetch_map()
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'dust_map_base'
name = 'DustMapBase'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

class rail.stages.EqualCountClassifier

Bases: PZClassifier

Classifier that simply assign tomographic bins based on point estimate according to SRD

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • object_id_col ([str] default=) – name of object id column

  • point_estimate_key ([str] default=zmode) – Which point estimate to use

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • n_tom_bins ([int] default=5) – Number of tomographic bins

  • no_assign ([int] default=-99) – Value for no assignment flag

  • input (QPHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'classify'
interactive_function: str | None = 'equal_count_classifier'
name = 'EqualCountClassifier'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

Processes the input data in chunks and performs classification.

This method iterates over chunks of the input data, calling the _process_chunk method for each chunk to perform the actual classification.

The _process_chunk method should be implemented by subclasses to define the specific classification logic.

Return type:

None

stage_columns: list[str] | None
class rail.stages.EuclidDeepErrorModel

Bases: PhotoErrorModel

The Euclid Deep Error model, defined by peEuclidDeepErrorParams and peEuclidDeepErrorModel

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'euclid_deep_error_model'
name = 'EuclidDeepErrorModel'
stage_columns: list[str] | None
class rail.stages.EuclidErrorModel

Bases: PhotoErrorModel

The Euclid Error model, defined by peEuclidErrorParams and peEuclidErrorModel

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'euclid_error_model'
name = 'EuclidErrorModel'
stage_columns: list[str] | None
class rail.stages.EuclidWideErrorModel

Bases: PhotoErrorModel

The Euclid Wide Error model, defined by peEuclidWideErrorParams and peEuclidWideErrorModel

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'euclid_wide_error_model'
name = 'EuclidWideErrorModel'
stage_columns: list[str] | None
class rail.stages.Evaluator

Bases: RailStage

Evaluate the performance of a photo-z estimator against reference point estimate

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • metrics ([list] default=[]) – The metrics you want to evaluate.

  • exclude_metrics ([list] default=[]) – List of metrics to exclude

  • metric_config ([dict] default={}) – configuration of individual_metrics

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • seed ([float] default=None) – Random seed value to use for reproducible results.

  • force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization

  • output (Hdf5Handle (OUTPUT))

  • summary (Hdf5Handle (OUTPUT))

  • single_distribution_summary (QPDictHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'evaluate'
evaluate(data, truth, **kwargs)

Evaluate the performance of an estimator

This will attach the input data and truth to this Evaluator (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Estimator by using self.add_data(‘output’, output_data).

Parameters:
  • data (qp.Ensemble) – The sample to evaluate

  • truth (Any) – Table with the truth information

Returns:

The evaluation metrics

Return type:

dict[str, DataHandle]

finalize()

Finalize the stage, moving all its outputs to their final locations.

Return type:

None

inputs: list[tuple[str, type[DataHandle]]] = []
metric_base_class: type[BaseMetric] | None = None
name = 'Evaluator'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>), ('summary', <class 'rail.core.data.Hdf5Handle'>), ('single_distribution_summary', <class 'rail.core.data.QPDictHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

run_single_node()
Return type:

None

stage_columns: list[str] | None
class rail.stages.FSPSPhotometryCreator

Bases: Creator

Derived class of Creator that generate synthetic photometric fsps_default_data from the rest-frame SED model generated with the FSPSSedModeler class. The user is required to provide galaxy redshifts and filter information in an .npy format for the code to run. The restframe SEDs are stored in a pickle file or passed as ModelHandle. Details of what each file should contain are explicited in config_options. The output is a Fits table containing magnitudes.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • redshift_key (str] (default=redshifts))

  • restframe_sed_key ([str] default=restframe_seds) – Rest-frame SED keyword name of the hdf5 dataset containing rest-frame SEDs

  • restframe_wave_key ([str] default=wavelength) – Rest-frame wavelengths keyword name of thehdf5 dataset containing rest-frame SEDs

  • apparent_mags_key ([str] default=apparent_mags) – Apparent magnitudes keyword name of the output hdf5 dataset

  • filter_folder ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/fsps_default_data/filters) – Folder containing filter transmissions

  • instrument_name ([str] default=lsst) – Instrument name as prefix to filter transmission files

  • wavebands ([list] default=['u', 'g', 'r', 'i', 'z', 'y']) – Comma-separated list of wavebands

  • filter_wave_key ([str] default=wave)

filter_transm_key: [str] default=transmission

Om0: [float] default=0.3

Omega matter at current time

Ode0: [float] default=0.7

Omega dark energy at current time

w0: [float] default=-1

Dark energy equation-of-state parameter at current time

wa: [float] default=0.0

Slope dark energy equation-of-state evolution with scale factor

h: [float] default=0.7

Dimensionless hubble constant

use_planck_cosmology: [bool] default=False

True to overwrite the cosmological parameters to their Planck2015 values

physical_units: [bool] default=False

False (True) for rest-frame spectra in units ofLsun/Hz (erg/s/Hz)

model: Hdf5Handle (INPUT)

output: Hdf5Handle (OUTPUT)

__init__(args, **kwargs)

Initialize class. The _b and _c tuples for jax are composed of None or 0, depending on whether you don’t or do want the array axis to map over for all arguments. :param args: :param comm:

default_files_folder = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/fsps_default_data'
entrypoint_function: str | None = 'sample'
inputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]
interactive_function: str | None = 'fsps_photometry_creator'
name = 'FSPSPhotometryCreator'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

This function computes apparent AB magnitudes in the provided wavebands for all the galaxies in the population having rest-frame SEDs computed by FSPS. It then stores apparent magnitudes, redshifts and running indices into an Hdf5Handle.

sample(input_data, seed=None, **kwargs)

Creates observed magnitudes for the population of galaxies and stores them into an Hdf5Handle.

Parameters:
  • input_data (Hdf5Handle) – Hdf5Handle containing the rest-frame SED models.

  • seed (int | None, optional) – The random seed to control sampling, by default None

Returns:

Hdf5Handle storing the apparent magnitudes and redshifts of galaxies.

Return type:

Hdf5Handle

Notes

This method puts seed into the stage configuration data, which makes them available to other methods. It then calls the run method. Finally, the Hdf5Handle associated to the output tag is returned.

stage_columns: list[str] | None
class rail.stages.FSPSSedModeler

Bases: Modeler

Derived class of Modeler for creating a single galaxy rest-frame SED model using FSPS (Conroy08).

Only the most important parameters are provided via config_options. The remaining ones from FSPS can be provided when creating the rest-frame SED model.

Install FSPS with the following commands:

 pip uninstall fsps
git clone --recursive https://github.com/dfm/python-fsps.git
cd python-fsps
python -m pip install .
export SPS_HOME=$(pwd)/src/fsps/libfsps
Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size (int] (default=10000))

  • hdf5_groupname (str] (default=photometry))

  • compute_vega_mags ([bool] default=False) – True uses Vega magnitudes versus AB magnitudes

  • vactoair_flag ([bool] default=False) – If True, output wavelengths in air (rather than vac)

  • zcontinuous ([int] default=1) – Flag for interpolation in metallicity of SSP before CSP

  • add_agb_dust_model ([bool] default=True) – Turn on/off adding AGB circumstellar dust contribution to SED

  • add_dust_emission ([bool] default=True) – Turn on/off adding dust emission contribution to SED

  • add_igm_absorption ([bool] default=False) – Turn on/off adding IGM absorption contribution to SED

  • add_neb_emission ([bool] default=False) – Turn on/off nebular emission model based on Cloudy

  • add_neb_continuum ([bool] default=False) – Turn on/off nebular continuum component

  • add_stellar_remnants ([bool] default=True) – Turn on/off adding stellar remnants contribution to stellar mass

  • compute_light_ages ([bool] default=False) – If True then the returned spectra are actually light-weighted ages (in Gyr)

  • nebemlineinspec ([bool] default=False) – True to include emission line fluxes in spectrum

  • smooth_velocity ([bool] default=True) – True/False for smoothing in velocity/wavelength space

  • smooth_lsf ([bool] default=False) – True/False for smoothing SSPs by a wavelength dependent line spread function

  • cloudy_dust ([bool] default=False) – Switch to include dust in the Cloudy tables

  • agb_dust ([float] default=1.0) – Scales the circumstellar AGB dust emission

  • tpagb_norm_type ([int] default=2) – Flag for TP-AGB normalization scheme, default Villaume, Conroy, Johnson 2015 normalization

  • dell ([float] default=0.0) – Shift in log(L_bol) of the TP-AGB isochrones

  • delt ([float] default=0.0) – Shift in log(T_eff) of the TP-AGB isochrones

  • redgb ([float] default=1.0) – Modify weight given to RGB. Only available with BaSTI isochrone set

  • agb ([float] default=1.0) – Modify weight given to TP-AGB

  • fcstar ([float] default=1.0) – Fraction of stars that the Padova isochrones identify as Carbon stars

  • sbss ([float] default=0.0) – Specific frequency of blue straggler stars

  • fbhb ([float] default=0.0) – Fraction of horizontal branch stars that are blue

  • pagb ([float] default=1.0) – Weight given to the post–AGB phase

  • redshifts_key ([str] default=redshifts) – galaxy redshift, dataset keyword name

  • Z_met_key ([str] default=zmet) – The metallicity is specified as an integer ranging between 1 and nz. If zcontinuous > 0 then this parameter is ignored, dataset keyword name

  • stellar_metallicities_key ([str] default=stellar_metallicity) – galaxy stellar metallicities (log10(Z / Zsun)) dataset keyword name, to be used with zcontinuous > 0,dataset keyword name

  • pmetals_key ([str] default=pmetals) – The power for the metallicty distribution function,only used if zcontinous=2, dataset keyword name

  • imf_type ([int] default=1) – IMF type, see FSPS manual, default Chabrier IMF

  • imf_upper_limit ([float] default=120.0) – The upper limit of the IMF in solar masses

  • imf_lower_limit ([float] default=0.08) – The lower limit of the IMF in solar masses

  • imf1 ([float] default=1.3) – log slope of IMF in 0.08<M/Msun<0.5, if imf_type=2

  • imf2 ([float] default=2.3) – log slope of IMF in 0.5<M/Msun<1, if imf_type=2

  • imf3 ([float] default=2.3) – log slope of IMF in M/Msun>1, if imf_type=2

  • vdmc ([float] default=0.08) – IMF parameter defined in van Dokkum (2008). Only used if imf_type=3

  • mdave ([float] default=0.5) – IMF parameter defined in Dave (2008). Only used if imf_type=4.

  • evtype ([int] default=-1) – Compute SSPs for only the given evolutionary type. All phases used when set to -1.

  • use_wr_spectra ([int] default=1) – Turn on/off the WR spectral library

  • logt_wmb_hot ([float] default=0.0) – Use the Eldridge (2017) WMBasic hot star library above this value of log(T_eff) or 25,000K,whichever is larger

  • masscut ([float] default=150.0) – Truncate the IMF above this value

  • velocity_dispersions_key ([str] default=stellar_velocity_dispersion) – stellar velocity dispersions (km/s), dataset keyword name

  • min_wavelength (float] (default=250.0))

  • max_wavelength (float] (default=12000.0))

  • gas_ionizations_key ([str] default=gas_ionization) – gas ionization values dataset keyword name

  • gas_metallicities_key ([str] default=gas_metallicity) – gas metallicities (log10(Zgas / Zsun)) dataset keyword name

  • igm_factor ([float] default=1.0) – Factor used to scale the IGM optical depth

  • sfh_type ([int] default=0) – star-formation history type, see FSPS manual, default SSP

  • tau_key ([str] default=tau) – Defines e-folding time for the SFH, in Gyr. Only used if sfh=1 or sfh=4, dataset keyword name

  • const_key ([str] default=const) – Defines the constant component of the SFH, Only used if sfh=1 or sfh=4, dataset keyword name

  • sf_start_key ([str] default=sf_start) – Start time of the SFH, in Gyr. Only used if sfh=1 or sfh=4 or sfh=5, dataset keyword name

  • sf_trunc_key ([str] default=sf_trunc) – Truncation time of the SFH, in Gyr. Only used if sfh=1 or sfh=4 or sfh=5, dataset keyword name

  • stellar_ages_key ([str] default=stellar_age) – galaxy stellar ages (Gyr),dataset keyword name

  • fburst_key ([str] default=fburst) – Defines the fraction of mass formed in an instantaneous burst of star formation. Only used if sfh=1 or sfh=4,dataset keyword name

  • tburst_key ([str] default=tburst) – Defines the age of the Universe when the burst occurs. If tburst > tage then there is no burst. Only used if sfh=1 or sfh=4, dataset keyword name

  • sf_slope_key ([str] default=sf_slope) – For sfh=5, this is the slope of the SFR after time sf_trunc, dataset keyword name

  • dust_type ([int] default=2) – attenuation curve for dust type, see FSPS manual, default Calzetti

  • dust_tesc ([float] default=7.0) – Stars younger than dust_tesc are attenuated by both dust1 and dust2, while stars older are attenuated by dust2 only. Units are log(yrs)

  • dust_birth_cloud_key ([str] default=dust1_birth_cloud) – dust parameter describing young stellar light attenuation (dust1 in FSPS), dataset keyword name

  • dust_diffuse_key ([str] default=dust2_diffuse) – dust parameters describing old stellar light attenuation (dust2 in FSPS) dataset keyword name

  • dust_clumps ([int] default=-99) – Dust parameter describing the dispersion of a Gaussian PDF density distribution for the old dust. Setting this value to -99.0 sets the distribution to a uniform screen, values other than -99 are no longer supported

  • frac_nodust ([float] default=0.0) – Fraction of starlight that is not attenuated by the diffuse dust component

  • frac_obrun ([float] default=0.0) – Fraction of the young stars (age < dust_tesc) that are not attenuated by dust1 and that do not contribute to any nebular emission, representing runaway OB stars or escaping ionizing radiation. These stars are still attenuated by dust2.

  • dust_index_key ([str] default=dust_index) – Power law index of the attenuation curve. Only used when dust_type=0, dataset keyword name

  • dust_powerlaw_modifier_key ([str] default=dust_calzetti_modifier) – power-law modifiers to the shape of the Calzetti et al. (2000) attenuation curve (dust1_index),dataset keyword name

  • mwr_key ([str] default=mwr) – The ratio of total to selective absorption which characterizes the MW extinction curve: RV=AV/E(B-V), used when dust_type=1,dataset keyword name

  • uvb_key ([str] default=uvb) – Parameter characterizing the strength of the 2175A extinction feature with respect to the standard Cardelli et al. determination for the MW. Only used when dust_type=1,dataset keyword name

  • wgp1_key ([str] default=wgp1) – Integer specifying the optical depth in the Witt & Gordon (2000) models. Values range from 1 − 18, used only whendust_type=3, dataset keyword name

  • wgp2 ([int] default=1) – Integer specifying the type of large-scale geometry and extinction curve. Values range from 1-6, used only when dust_type=3

  • wgp3 ([int] default=1) – Integer specifying the local geometry for the Witt & Gordon (2000) dust models, used only when dust_type=3

  • dust_emission_gamma_key ([str] default=dust_gamma) – Relative contributions of dust heated at Umin, parameter of Draine and Li (2007) dust emission modeldataset keyword name

  • dust_emission_umin_key ([str] default=dust_umin) – Minimum radiation field strengths, parameter of Draine and Li (2007) dust emission model, dataset keyword name

  • dust_emission_qpah_key ([str] default=dust_qpah) – Grain size distributions in mass in PAHs, parameter of Draine and Li (2007) dust emission model,dataset keyword name

  • fraction_agn_bol_lum_key ([str] default=f_agn) – Fractional contributions of AGN wrt stellar bolometric luminosity, dataset keyword name

  • agn_torus_opt_depth_key ([str] default=tau_agn) – Optical depths of the AGN dust torii dataset keyword name

  • tabulated_sfh_key ([str] default=tabulated_sfh) – tabulated SFH dataset keyword name

  • tabulated_lsf_key ([str] default=tabulated_lsf) – tabulated LSF dataset keyword name

  • physical_units ([bool] default=False) – False (True) for rest-frame spectra in units ofLsun/Hz (erg/s/Hz)

  • restframe_wave_key ([str] default=restframe_wavelengths) – Rest-frame wavelength keyword name of the output hdf5 dataset

  • restframe_sed_key ([str] default=restframe_seds) – Rest-frame SED keyword name of the output hdf5 dataset

  • input (Hdf5Handle (INPUT))

  • model (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

This function initializes the FSPSSedModeler class and checks that the provided parameters are within the allowed ranges.

Parameters:
  • args

  • comm

entrypoint_function: str | None = 'fit_model'
fit_model(input_data, **kwargs)

This function creates rest-frame SED models from an input galaxy population catalog.

Parameters:

input_data (Hdf5Handle) – This is the input catalog in the form of an Hdf5Handle.

Returns:

ModelHandle storing the rest-frame SED models

Return type:

Hdf5Handle

inputs = [('input', <class 'rail.core.data.Hdf5Handle'>)]
interactive_function: str | None = 'fsps_sed_modeler'
name = 'FSPSSedModeler'
outputs = [('model', <class 'rail.core.data.Hdf5Handle'>)]
run()

Run method. It Calls StellarPopulation from FSPS to create a galaxy rest-frame SED. Thanks to Josue de Santiago, this function is able to run in parallel via mpi by splitting the full sample in chunks of user-defined size.

stage_columns: list[str] | None
class rail.stages.FlexZBoostEstimator

Bases: CatEstimator

FlexZBoost-based CatEstimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col ([str] default=redshift) – name of redshift column

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • qp_representation ([str] default=interp) – qp generator to use. [interp|flexzboost]

  • include_mag_err ([bool] default=False) – Include magnitude error in the training and estimationprocess

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do CatEstimator specific initialization

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'flex_z_boost_estimator'
name = 'FlexZBoostEstimator'
stage_columns: list[str] | None
class rail.stages.FlexZBoostInformer

Bases: CatInformer

Train a FlexZBoost CatInformer

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • redshift_col (str] (default=redshift))

  • retrain_full ([bool] default=True) – if True, re-run the fit with the full training set, including data set aside for bump/sharpen validation. If False, only use the subset defined via trainfrac fraction

  • trainfrac ([float] default=0.75) – fraction of training data to use for training (rest used for bump thresh and sharpening determination)

  • seed ([int] default=1138) – Random number seed

  • bumpmin ([float] default=0.02) – minimum value in grid of thresholds checked to optimize removal of spurious small bumps

  • bumpmax ([float] default=0.35) – max value in grid checked for removal of small bumps

  • nbump ([int] default=20) – number of grid points in bumpthresh grid search

  • sharpmin ([float] default=0.7) – min value in grid checked in optimal sharpening parameter fit

  • sharpmax ([float] default=2.1) – max value in grid checked in optimal sharpening parameter fit

  • nsharp ([int] default=15) – number of search points in sharpening fit

  • max_basis ([int] default=35) – maximum number of basis funcitons to use in density estimate

  • basis_system ([str] default=cosine) – type of basis sytem to use with flexcode

  • regression_params ([dict] default={'max_depth': 8, 'objective': 'reg:squarederror'}) – dictionary of options passed to flexcode, includes max_depth (int), and objective, which should be set to reg:squarederror

  • include_mag_err ([bool] default=False) – Include magnitude error in the training and estimationprocess

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor Do CatInformer specific initialization, then check on bands

divide_array(grid)
entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'flex_z_boost_informer'
name = 'FlexZBoostInformer'
run()

Train flexzboost model model

static split_data(fz_data, sz_data, trainfrac, seed)

make a random partition of the training data into training and validation, validation data will be used to determine bump thresh and sharpen parameters.

stage_columns: list[str] | None
class rail.stages.FlowCreator

Bases: Creator

Creator wrapper for a PZFlow Flow object.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • n_samples ([int] (required)) – Number of samples to create

  • seed ([int] default=12345) – Random number seed

  • model (FlowHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor

Does standard Creator initialization and also gets the Flow object

entrypoint_function: str | None = 'sample'
inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]
interactive_function: str | None = 'flow_creator'
name = 'FlowCreator'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Run method

Calls Flow.sample to use the Flow object to generate photometric data

Notes

Puts the data into the data store under this stages ‘output’ tag

stage_columns: list[str] | None
class rail.stages.FlowModeler

Bases: Modeler

Modeler wrapper for a PZFlow Flow object.

This class trains the flow.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([int] default=0) – The random seed for training.

  • phys_cols ([dict] default={'redshift': [0, 3]}) – Names of non-photometry columns and their corresponding [min, max] values.

  • phot_cols ([dict] default={'mag_u_lsst': [17, 35], 'mag_g_lsst': [16, 32], 'mag_r_lsst': [15, 30], 'mag_i_lsst': [15, 30], 'mag_z_lsst': [14, 29], 'mag_y_lsst': [14, 28]}) – Names of photometry columns and their corresponding [min, max] values.

  • calc_colors ([dict] default={'ref_column_name': 'mag_i_lsst'}) – Whether to internally calculate colors (if phot_cols are magnitudes). Assumes that you want to calculate colors from adjacent columns in phot_cols. If you do not want to calculate colors, set False. Else, provide a dictionary {‘ref_column_name’: band}, where band is a string corresponding to the column in phot_cols you want to save as the overall galaxy magnitude.

  • spline_knots ([int] default=16) – The number of spline knots in the normalizing flow.

  • n_training_epochs ([int] default=30) – The number of training epochs.

  • input (TableHandle (INPUT))

  • model (FlowHandle (OUTPUT))

__init__(args, **kwargs)

Constructor

Does standard Modeler initialization.

entrypoint_function: str | None = 'fit_model'
inputs = [('input', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'flow_modeler'
name = 'FlowModeler'
outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]
run()

Run method

Calls Flow.train to train a normalizing flow using PZFlow.

Notes

Puts the data into the data store under this stages ‘output’ tag

stage_columns: list[str] | None
validate()

Check that the inputs actually have the data needed for execution, This is called before the run method. It is an optional stage, meant for checking that the input to the stage is actual in the form and shape needed before an expensive run is executed.

class rail.stages.FlowPosterior

Bases: PosteriorCalculator

PosteriorCalculator wrapper for a PZFlow Flow object

data : pd.DataFrame
    Pandas dataframe of the data on which the posteriors are conditioned.
    Must have all columns in self.flow.data_columns, *except*
    for the column specified for the posterior (see below).

column : str
    Name of the column for which the posterior is calculated.
    Must be one of the columns in self.flow.data_columns. However,
    whether or not this column is present in `data` is irrelevant.

grid : np.ndarray
    Grid over which the posterior is calculated.

err_samples : int, optional
    Number of samples from the error distribution to average over for
    the posterior calculation. If provided, Gaussian errors are assumed,
    and method will look for error columns in `inputs`. Error columns
    must end in `_err`. E.g. the error column for the variable `u` must
    be `u_err`. Zero error assumed for any missing error columns.

seed: int, optional
    Random seed for drawing samples from the error distribution.

marg_rules : dict, optional
    Dictionary with rules for marginalizing over missing variables.
    The dictionary must contain the key "flag", which gives the flag
    that indicates a missing value. E.g. if missing values are given
    the value 99, the dictionary should contain {"flag": 99}.
    The dictionary must also contain {"name": callable} for any
    variables that will need to be marginalized over, where name is
    the name of the variable, and callable is a callable that takes
    the row of variables and returns a grid over which to marginalize
    the variable. E.g. {"y": lambda row: np.linspace(0, row["x"], 10)}.
    Note: the callable for a given name must *always* return an array
    of the same length, regardless of the input row.
    DEFAULT: the default marg_rules dict is
    {"flag": np.nan,
    "u": np.linspace(25, 31, 10),}

batch_size: int, default=None
    Size of batches in which to calculate posteriors. If None, all
    posteriors are calculated simultaneously. This is faster, but
    requires more memory.

nan_to_zero : bool, default=True
    Whether to convert NaN's to zero probability in the final pdfs.
Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • column ([str] (required)) – Column to compute posterior for

  • grid ([list] default=[]) – Grid over which the posterior is calculated

  • err_samples ([int] default=10) – A parameter

  • seed ([int] default=12345) – A parameter

  • marg_rules ([dict] default={'flag': nan, 'mag_u_lsst': <function FlowPosterior.<lambda> at 0x78d747342da0>}) – A parameter

  • batch_size (int] (default=10000))

  • nan_to_zero (bool] (default=True))

  • model (FlowHandle (INPUT))

  • input (PqHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor

Does standard PosteriorCalculator initialization

entrypoint_function: str | None = 'get_posterior'
inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'flow_posterior'
name = 'FlowPosterior'
outputs = [('output', <class 'rail.core.data.QPHandle'>)]
run()

Run method

Calls Flow.posterior to use the Flow object to get the posterior distribution.

Notes

Get the input data from the data store under this stages ‘input’ tag Puts the data into the data store under this stages ‘output’ tag

stage_columns: list[str] | None
rail.stages.GCRLoader

alias of GCRCreator

class rail.stages.GPzEstimator

Bases: CatEstimator

Estimate stage for GPz_v1

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col ([str] default=redshift) – name of redshift column

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • log_errors ([bool] default=True) – if true, take log of magnitude errors

  • replace_error_vals (list] (default=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1]))

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do CatEstimator specific initialization

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'gpz_estimator'
name = 'GPzEstimator'
stage_columns: list[str] | None
class rail.stages.GPzInformer

Bases: CatInformer

Inform stage for GPz_v1

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • train_frac ([float] default=0.75) – fraction of training data used to make tree, rest used to set best sigma

  • seed ([int] default=87) – random seed

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • redshift_col (str] (default=redshift))

  • gpz_method ([str] default=VC) – method to be used in GPz, options are ‘GL’, ‘VL’, ‘GD’, ‘VD’, ‘GC’, and ‘VC’

  • n_basis ([int] default=50) – number of basis functions used

  • learn_jointly ([bool] default=True) – if True, jointly learns prior linear mean function

  • hetero_noise ([bool] default=True) – if True, learns heteroscedastic noise process, set False for point est.

  • csl_method ([str] default=normal) – cost sensitive learning type, ‘balanced’, ‘normalized’, or ‘normal’

  • csl_binwidth ([float] default=0.1) – width of bin for ‘balanced’ cost sensitive learning

  • pca_decorrelate ([bool] default=True) – if True, decorrelate data using PCA as preprocessing stage

  • max_iter ([int] default=200) – max number of iterations

  • max_attempt ([int] default=100) – max iterations if no progress on validation

  • log_errors ([bool] default=True) – if true, take log of magnitude errors

  • replace_error_vals (list] (default=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1]))

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor Do CatInformer specific initialization

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'gpz_informer'
name = 'GPzInformer'
run()

train the GPz model after splitting train data into train/validation

stage_columns: list[str] | None
class rail.stages.GaussianPzEstimator

Bases: PzEstimator

Estimator which converts to Gaussian reps

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • model (ModelHandle (INPUT))

  • input (QPHandle (INPUT))

  • output (QPHandle (OUTPUT))

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'gaussian_pz_estimator'
name = 'GaussianPzEstimator'
stage_columns: list[str] | None
class rail.stages.GaussianPzInformer

Bases: PzInformer

Placeholder Informer

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • input (QPHandle (INPUT))

  • truth (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'gaussian_pz_informer'
model_handle: ModelHandle | None
name = 'GaussianPzInformer'
stage_columns: list[str] | None
class rail.stages.GaussianSkewtScatterSelector

Bases: Selector

Add a mock photometric redshift column to a dataframe with a Gaussian + skew Student-t error model

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • col_name ([str] default=photoz_mock) – Name of the mock photometric redshift column to make

  • col_name_mag_i ([str] default=mag_i) – Name of the i-band magnitude column

  • col_name_z ([str] default=z) – Name of the (true) redshift column

  • selector_model_dict ([dict] default={'mag_i_bin_edges': array([15.5, 22. , 23. , 24. , 29. ]), 'z_bin_edges': array([0. , 0.3, 0.7, 1. , 1.5, 2. , 2.5, 3. , 4. ]), 'bias_median_lookup_table': array([[ 0. , 0.002, -0.002, 0.001, 0.001, 0.001, 0.001, 0.001],) –

    [ 0. , -0. , -0.002, -0.004, 0.01 , 0.01 , 0.01 , 0.01 ],

    [ 0.003, -0. , -0.001, -0.006, -0.002, 0.024, 0.007, 0. ], [ 0.008, -0.005, 0.007, -0.015, -0.019, 0.017, 0.011, 0. ]]), ‘bias_std_lookup_table’: array([[0.01 , 0.02 , 0.026, 0.038, 0.038, 0.038, 0.038, 0.038], [0.011, 0.019, 0.025, 0.036, 0.062, 0.062, 0.062, 0.062], [0.011, 0.022, 0.027, 0.044, 0.063, 0.115, 0.093, 0.074], [0.013, 0.023, 0.025, 0.051, 0.069, 0.12 , 0.103, 0.069]]), ‘f_tail_by_mag_i’: array([0.088 , 0.1377, 0.4312, 0.4312]), ‘tail_loc_by_mag_i’: array([-0.0055, 0.1568, 0.2 , 0.2 ]), ‘tail_scale_by_mag_i’: array([0.2041, 0.3522, 0.237 , 0.237 ]), ‘tail_a_by_mag_i’: array([ 3.7662, 10.1149, 2. , 2. ]), ‘tail_b_by_mag_i’: array([ 4. , 11.2095, 4. , 4. ])}

    Dictionary of model parameters for Gaussian core and skew-t tail distribution components

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

GaussianSkewtScatterSelector(sample, seed=None, **kwargs)
Parameters:
  • sample (Any)

  • seed (int | None)

  • kwargs (Any)

__init__(args, **kwargs)

Constructor Does standard Selector initialization

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'GaussianSkewtScatterSelector'
name = 'GaussianSkewtScatterSelector'
stage_columns: list[str] | None
class rail.stages.GridSelection

Bases: Selector

Uses the ratio of HSC spectroscpic galaxies to photometric galaxies to portion a sample into training and application samples. Option to implement a color-based redshift cut off in each pixel. Option of further degrading the training sample by limiting it to galaxies less than a redshift cutoff by specifying redshift_cut.

color_redshift_cut: True or false, implements color-based redshift cut. Default is True.
    If True, ratio_file must include second key called 'data' with magnitudes, colors and spec-z from the spectroscopic sample.
percentile_cut: If using color-based redshift cut, percentile in spec-z above which redshifts will be cut from training sample. Default is 99.0
scaling_factor: Enables the user to adjust the ratios by this factor to change the overall number of galaxies kept.  For example, if you wish
    to generate 100,00 galaxies but only 50,000 are selected by default, then you can adjust factor up by a factor of 2 to return more galaixes.
redshift_cut: redshift above which all galaxies will be removed from training sample. Default is 100
ratio_file: hdf5 file containing an array of spectroscpic vs. photometric galaxies in each pixel. Default is hsc_ratios.hdf5 for an HSC based selection
settings_file: pickled dictionary containing information about colors and magnitudes used in defining the pixels. Dictionary must include the following keys:
    'x_band_1': string, this is the band used for the magnitude in the color magnitude diagram. Default for HSC is 'i'.
    'x_band_2': string, this is the redder band used for the color in the color magnitude diagram.
    if x_band_2 string is not set to '' then the grid is assumed to be over color and x axis color is set to x_band_1 - x_band_2, default is ''.
    'y_band_1': string, this is the bluer band used for the color in the color magnitude grid. Default for HSC is 'g'.
    'y_band_2': string, this is the redder band used for the color in the color magnitude diagram.
    if y_band_2 is not set to '' then the y-band is assumed to be over color and is set to y_band_1 - y_band 2.
    'x_limits': 2-element list, this is a list of the lower and upper limits of the magnitude. Default for HSC is [13, 16],
    'y_limits': 2-element list, this is a list of the lower and upper limits of the color. Default for HSC is [-2, 6]}

NOTE: the default ‘HSC’ grid file, located in rail/examples_data/creation_data/data/hsc_ratios_and_specz.hdf5, is based on data from the Second HSC Data Release, details of which can be found here: Aihara, H., AlSayyad, Y., Ando, M., et al. 2019, PASJ, 71, 114 doi: 10.1093/pasj/psz103

Update(Apr 16 2024): Now inherit from selector and implement the _select() instead of run()

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=12345) – random seed for reproducibility

  • color_redshift_cut ([bool] default=True) – using color-based redshift cut

  • percentile_cut ([float] default=99.0) – percentile cut-off for each pixel in color-based redshift cut off

  • redshift_cut ([float] default=100.0) – cut redshifts above this value

  • ratio_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/hsc_ratios_and_specz.hdf5) – path to ratio file

  • settings_file ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/HSC_grid_settings.pkl) – path to pickled parameters file

  • scaling_factor ([float] default=1.588) – multiplicative factor for ratios to adjust number of galaxies kept

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

def_ratio_file = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/hsc_ratios_and_specz.hdf5'
def_set_file = '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/HSC_grid_settings.pkl'
entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'grid_selection'
name = 'GridSelection'
stage_columns: list[str] | None
class rail.stages.HyperbolicMagnitudes

Bases: PhotometryManipulator

Convert a set of classical magnitudes to hyperbolic magnitudes (Lupton et al. 1999). Requires input from the initial stage (HyperbolicSmoothing) to supply optimal values for the smoothing parameters (b).

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • value_columns ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – list of columns that prove photometric measurements (fluxes or magnitudes)

  • error_columns ([list] default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']) – list of columns with errors corresponding to value_columns (assuming same ordering)

  • zeropoints ([list] default=[]) – optional list of magnitude zeropoints for value_columns (assuming same ordering, defaults to 0.0)

  • is_flux ([bool] default=False) – whether the provided quantities are fluxes or magnitudes

  • input (PqHandle (INPUT))

  • parameters (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

compute(data, parameters, **kwargs)

Main method to call. Outputs hyperbolic magnitudes compuated from a set of smoothing parameters and input catalogue with classical magitudes and their respective errors.

Parameters:
  • data (PqHandle) – Input table with photometry (magnitudes or flux columns and their respective uncertainties) as defined by the configuration.

  • parameters (PqHandle) – Table witdh smoothing parameters per photometric band, determined by HyperbolicSmoothing.

Returns:

Output table containting hyperbolic magnitudes and their uncertainties. If the columns in the input table contain a prefix mag_, this output tabel will replace the prefix with hyp_mag_, otherwise the column names will be identical to the input table.

Return type:

PqHandle

entrypoint_function: str | None = 'compute'
inputs = [('input', <class 'rail.core.data.PqHandle'>), ('parameters', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'hyperbolic_magnitudes'
name = 'HyperbolicMagnitudes'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Compute hyperbolic magnitudes and their error based on the parameters determined by HyperbolicSmoothing.

class rail.stages.HyperbolicSmoothing

Bases: PhotometryManipulator

Initial stage to compute hyperbolic magnitudes (Lupton et al. 1999). Estimates the smoothing parameter b that is used by the second stage (HyperbolicMagnitudes) to convert classical to hyperbolic magnitudes.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • value_columns ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – list of columns that prove photometric measurements (fluxes or magnitudes)

  • error_columns ([list] default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']) – list of columns with errors corresponding to value_columns (assuming same ordering)

  • zeropoints ([list] default=[]) – optional list of magnitude zeropoints for value_columns (assuming same ordering, defaults to 0.0)

  • is_flux ([bool] default=False) – whether the provided quantities are fluxes or magnitudes

  • input (PqHandle (INPUT))

  • parameters (PqHandle (OUTPUT))

compute(data, **kwargs)

Main method to call. Computes the set of smoothing parameters (b) for an input catalogue with classical photometry and their respective errors. These parameters are required by the follow-up stage HyperbolicMagnitudes and are parsed as tabular data.

Parameters:

data (PqHandle) – Input table with magnitude and magnitude error columns as defined in the configuration.

Returns:

Table with smoothing parameters per photometric band and additional meta data.

Return type:

PqHandle

entrypoint_function: str | None = 'compute'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'hyperbolic_smoothing'
name = 'HyperbolicSmoothing'
outputs = [('parameters', <class 'rail.core.data.PqHandle'>)]
run()

Computes the smoothing parameter b (see Lupton et al. 1999) per photometric band.

class rail.stages.IGMExtinctionModel

Bases: Noisifier

Degrader that simulates IGM extinction.

Note that the extinction is only applied to u and g bands, assuming that the maximum redshift of the same is < ~3.

Note also that the code assumes the first two input bands are always u and g. These bands are also needed to compute the UV slope.

An initial UV slope of -2 is assumed. There is option to update the UV slope through one iteration, based on the u and g fluxes.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • data_path ([str] default=None) – data_path (str): file path to the FILTER directories. If left to default None it will use the install directory for rail + rail/examples_data/estimation_data/data

  • filter_list (list] (default=['DC2LSST_u', 'DC2LSST_g', 'DC2LSST_r', 'DC2LSST_i', 'DC2LSST_z', 'DC2LSST_y']))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • redshift_col (str] (default=redshift))

  • compute_uv_slope ([bool] default=True) – whether to compute the UV slopeIf not, the initial value of -2 will be used

  • optical_depth_interpolator ([bool] default=True) – whether to precompute optical depth as a functionof wavelength and redshift, and interpolate the grid.Notice that if False, the computation loops over allobjects, hence can take a very long time!

  • redshift_grid ([list] default=[1.5, 4, 100]) – the redshift grid to interpolate on, enter a list containing:z_min, z_max, number_of_grid. The default values should havea precision that suffice most purpose

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)
entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'igm_extinction_model'
name = 'IGMExtinctionModel'
stage_columns: list[str] | None
class rail.stages.InvRedshiftIncompleteness

Bases: Selector

Degrader that simulates incompleteness with a selection function inversely proportional to redshift.

The survival probability of this selection function is p(z) = min(1, z_p/z), where z_p is the pivot redshift.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • pivot_redshift ([float] (required)) – redshift at which the incompleteness begins

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)
entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'inv_redshift_incompleteness'
name = 'InvRedshiftIncompleteness'
stage_columns: list[str] | None
class rail.stages.KDEBinOverlap

Bases: RailStage

Stage KDEBinOverlap

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • redshift_col ([str] default=redshift) – name of redshift column

  • bin_name ([str] default=class_id) – Groupname for the tomographic bin index in the hdf5 handle

  • truth (TableHandle (INPUT))

  • bin_index (Hdf5Handle (INPUT))

  • output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'evaluate'
evaluate(bin_index, truth, **kwargs)

Evaluate function for KDEBinOverlap

Parameters:
  • bin_index (TableLike) – bin_index

  • truth (TableLike) – truth

Returns:

Output data

Return type:

Hdf5Handle

inputs = [('truth', <class 'rail.core.data.TableHandle'>), ('bin_index', <class 'rail.core.data.Hdf5Handle'>)]
interactive_function: str | None = 'kde_bin_overlap'
name = 'KDEBinOverlap'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.KNearNeighEstimator

Bases: CatEstimator

KNN-based estimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col (str] (default=redshift))

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do Estimator specific initialization

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'k_near_neigh_estimator'
name = 'KNearNeighEstimator'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

stage_columns: list[str] | None
class rail.stages.KNearNeighInformer

Bases: CatInformer

Train a KNN-based estimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname (str] (default=photometry))

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • redshift_col (str] (default=redshift))

  • train_frac ([float] default=0.75) – fraction of training data used to make tree, rest used to set best sigma

  • seed ([int] default=0) – Random number seed for NN training

  • sigma_grid_min ([float] default=0.01) – minimum value of sigma for grid check

  • sigma_grid_max ([float] default=0.075) – maximum value of sigma for grid check

  • ngrid_sigma ([int] default=10) – number of grid points in sigma check

  • leaf_size ([int] default=15) – min leaf size for KDTree

  • nneigh_min ([int] default=3) – int, min number of near neighbors to use for PDF fit

  • nneigh_max ([int] default=7) – int, max number of near neighbors to use ofr PDF fit

  • only_colors ([bool] default=False) – if only_colors True, then do not use ref_band mag, only use colors

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor Do CatInformer specific initialization, then check on bands

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'k_near_neigh_informer'
name = 'KNearNeighInformer'
run()

train a KDTree on a fraction of the training data

stage_columns: list[str] | None
class rail.stages.LSSTErrorModel

Bases: PhotoErrorModel

The LSST Error model, defined by peLsstErrorParams and peLsstErrorModel

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'lsst_error_model'
name = 'LSSTErrorModel'
stage_columns: list[str] | None
class rail.stages.LSSTFluxToMagConverter

Bases: RailStage

Utility stage that converts from fluxes to magnitudes

Note, this is hardwired to take parquet files as input and provide hdf5 files as output

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • bands ([list] default=['u', 'g', 'r', 'i', 'z', 'y']) – Names of the bands

  • flux_name ([str] default={band}_gaap1p0Flux) – Template for band names

  • flux_err_name ([str] default={band}_gaap1p0FluxErr) – Template for band error column names

  • mag_name ([str] default=mag_{band}_lsst) – Template for magnitude column names

  • mag_err_name ([str] default=mag_err_{band}_lsst) – Template for magnitude error column names

  • copy_col_dict ([dict] default={}) – Map of other columns to copy

  • mag_offset ([float] default=31.4) – Magntidue offset value

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'lsst_flux_to_mag_converter'
mag_conv = np.float64(0.9210340371976184)
name = 'LSSTFluxToMagConverter'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

class rail.stages.LephareEstimator

Bases: CatEstimator

LePhare-base CatEstimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col (str] (default=redshift))

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • lephare_config ([dict] default={}) – The lephare config keymap. If unset we load it from the model.

  • use_inform_offsets ([bool] default=True) – Use the zero point offsets computed in the inform stage.

  • posterior_output ([int] default=11) – Which posterior distribution to output.MASS: 0SFR: 1SSFR: 2LDUST: 3LIR: 4AGE: 5COL1: 6COL2: 7MREF: 8MIN_ZG: 9MIN_ZQ: 10BAY_ZG: 11BAY_ZQ: 12

  • output_keys ([list] default=['Z_BEST', 'CHI_BEST', 'ZQ_BEST', 'CHI_QSO', 'MOD_STAR', 'CHI_STAR']) – The output keys to add to ancil. These must be in the output para file. By default we include the best galaxy and QSO redshift and best star alongside their respective chi squared.

  • run_dir ([str] default=None) – Override for the LEPHAREWORK directory. If None we load it from the model which is set during the inform stage. This is to facilitate manually moving intermediate files.

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'lephare_estimator'
lephare_config: dict
name = 'LephareEstimator'
nzbins: int | None
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

stage_columns: list[str] | None
zmax: float | None
zmin: float | None
class rail.stages.LephareInformer

Bases: CatInformer

Inform stage for LephareEstimator

This class will set templates and filters required for photoz estimation.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • redshift_col (str] (default=redshift))

  • lephare_config ([dict] (default={...})) – The lephare config keymap.

  • star_config ([dict] default={'LIB_ASCII': 'YES'}) – Star config overrides.

  • gal_config ([dict] default={'LIB_ASCII': 'YES', 'MOD_EXTINC': '18,26,26,33,26,33,26,33', 'EXTINC_LAW': 'SMC_prevot.dat,SB_calzetti.dat,SB_calzetti_bump1.dat,SB_calzetti_bump2.dat', 'EM_LINES': 'EMP_UV', 'EM_DISPERSION': '0.5,0.75,1.,1.5,2.'}) – Galaxy config overrides.

  • qso_config ([dict] default={'LIB_ASCII': 'YES', 'MOD_EXTINC': '0,1000', 'EB_V': '0.,0.1,0.2,0.3', 'EXTINC_LAW': 'SB_calzetti.dat'}) – QSO config overrides.

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Init function, init config stuff (COPIED from rail_bpz)

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'lephare_informer'
name = 'LephareInformer'
run()

Run rail_lephare inform stage.

This informer takes the config and templates and makes the inputs required for the run.

In addition to the three lephare stages making the filter, sed, and magnitude libraries we also do some tasks required by all rail inform stages.

stage_columns: list[str] | None
validate()

Check that the inputs actually have the data needed for execution, This is called before the run method. It is an optional stage, meant for checking that the input to the stage is actual in the form and shape needed before an expensive run is executed.

class rail.stages.LineConfusion

Bases: Noisifier

Degrader that simulates emission line confusion.

degrader = LineConfusion(true_wavelen=3727,
                         wrong_wavelen=5007,
                         frac_wrong=0.05)

is a degrader that misidentifies 5% of OII lines (at 3727 angstroms) as OIII lines (at 5007 angstroms), which results in a larger spectroscopic redshift.

Note that when selecting the galaxies for which the lines are confused, the degrader ignores galaxies for which this line confusion would result in a negative redshift, which can occur for low redshift galaxies when wrong_wavelen < true_wavelen.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • true_wavelen ([float] (required)) – wavelength of the true emission line

  • wrong_wavelen ([float] (required)) – wavelength of the wrong emission line

  • frac_wrong ([float] (required)) – fraction of galaxies with confused emission lines

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)
entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'line_confusion'
name = 'LineConfusion'
stage_columns: list[str] | None
class rail.stages.MiniSOMInformer

Bases: CatInformer

Summarizer that uses a SOM to construct a weighted sum of spec-z objects in the same SOM cell as each photometric galaxy in order to estimate the overall N(z). This is very related to the NZDir estimator, though that estimator actually reverses this process and looks for photometric neighbors around each spectroscopic galaxy, which can lead to problems if there are photometric galaxies with no nearby spec-z objects (NZDir is not aware that such objects exist and thus can hid biases). Part of the SimpeSOM estimator will be a check for cells which contain photometric objects but do not contain any corresponding training/spec-z objects, those unmatched objects will be flagged for possible removal from the input sample. The inform stage will simply construct a 2D grid SOM using minisom from a large sample of input photometric data and save this as an output. This may be a computationally intensive stage, though it will hopefully be run once and used by the estimate/summarize stage many times without needing to be re-run.

We can make the SOM either with all colors, or one magnitude and N colors, or an arbitrary set of columns. The code includes a flag column_usage to set usage, If set to “colors” it will take the difference of each adjacen pair of columns in bands as the colors. If set to magandcolors it will use these colors plus one magnitude as specified by ref_band. If set to columns then it will take as inputs all of the columns specified by bands (they can be magnitudes, colors, or any other input specified by the user). NOTE: any custom bands parameters must have an accompanying nondetect_val dictionary that will replace nondetections with the nondetect_val values!

This will make a pickle file containing the minisom SOM object that will be used by the estimation/summarization stage

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname (str] (default=photometry))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • column_usage ([str] default=magandcolors) – switch for how SOM uses columns, valid values are ‘colors’, ‘magandcolors’, and ‘columns’

  • seed ([int] default=0) – Random number seed

  • m_dim ([int] default=31) – number of cells in SOM y dimension

  • n_dim ([int] default=31) – number of cells in SOM x dimension

  • som_sigma ([float] default=1.5) – sigma param in SOM training

  • som_learning_rate ([float] default=0.5) – SOM learning rate

  • som_iterations ([int] default=10000) – number of iterations in SOM training

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do Informer specific initialization

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'mini_som_informer'
name = 'MiniSOMInformer'
run()

Build a SOM from photometric data NOT spectroscopic data!

stage_columns: list[str] | None
class rail.stages.MiniSOMSummarizer

Bases: SZPZSummarizer

Quick implementation of a SOM-based summarizer that constructs and N(z) estimate via a weighted sum of the empirical N(z) consisting of the normalized histogram of spec-z values contained in the same SOM cell as each photometric galaxy. There are some general guidelines to choosing the geometry and number of total cells in the SOM. This paper: http://www.giscience2010.org/pdfs/paper_230.pdf recommends 5*sqrt(num rows * num data columns) as a rough guideline. Some authors state that a SOM with one dimension roughly twice as long as the other are better, while others find that square SOMs with equal X and Y dimensions are best, the user can set the dimensions using the n_dim and m_dim parameters. For more discussion on SOMs and photo-z calibration, see the KiDS paper on the topic: http://arxiv.org/abs/1909.09632 particularly the appendices. Note that several parameters are stored in the model file, e.g. the columns used. This ensures that the same columns used in constructing the SOM are used when finding the winning SOM cell with the test data. Two additional files are also written out: cellid_output outputs the ‘winning’ SOM cell for each photometric galaxy, in both raveled and 2D SOM cell coordinates. If the objectID or galaxy_id is present they will also be included in this file, if not the coordinates will be written in the same order in which the data is read in. uncovered_cell_file outputs the raveled cell IDs of cells that contain photometric galaxies but no corresponding spectroscopic objects, these objects should be removed from the sample as they cannot be accounted for properly in the summarizer. Some iteration on data cuts may be necessary to remove/mitigate these ‘uncovered’ objects.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • hdf5_groupname (str] (default=photometry))

  • redshift_col (str] (default=redshift))

  • objid_name ([str] default=) – name of ID column, if present will be written to cellid_output

  • spec_groupname ([str] default=photometry) – name of hdf5 group for spec data, if None, then set to ‘’

  • seed ([int] default=12345) – random seed

  • phot_weightcol ([str] default=) – name of photometry weight, if present

  • spec_weightcol ([str] default=) – name of specz weight col, if present

  • n_samples ([int] default=20) – number of bootstrap samples to generate

  • input (TableHandle (INPUT))

  • spec_input (TableHandle (INPUT))

  • model (ModelHandle (INPUT))

  • output (QPHandle (OUTPUT))

  • single_NZ (QPHandle (OUTPUT))

  • cellid_output (TableHandle (OUTPUT))

  • uncovered_cell_file (TableHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator that can sample galaxy data.

entrypoint_function: str | None = 'summarize'
interactive_function: str | None = 'mini_som_summarizer'
name = 'MiniSOMSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>), ('cellid_output', <class 'rail.core.data.TableHandle'>), ('uncovered_cell_file', <class 'rail.core.data.TableHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None
class rail.stages.Modeler

Bases: RailStage

Base class for creating a model of redshift and photometry.

__init__(args, **kwargs)

Initialize Modeler

Parameters:
  • args (Any)

  • kwargs (Any)

entrypoint_function: str | None = 'fit_model'
fit_model(input_data, **kwargs)

Produce a creation model from which photometry and redshifts can be generated.

Parameters:

input_data (DataHandle) –

???

Returns:

This will definitely be a wrapper around a File, but the filetype and format depend entirely on the modeling approach

Return type:

ModelHandle

inputs = [('input', <class 'rail.core.data.DataHandle'>)]
name = 'Modeler'
outputs = [('model', <class 'rail.core.data.ModelHandle'>)]
stage_columns: list[str] | None
class rail.stages.NZDirInformer

Bases: CatInformer

Quick implementation of an NZ Estimator that creates weights for each input object using sklearn’s NearestNeighbors. Very basic, we can probably create a more sophisticated SOM-based DIR method in the future. This inform stage just creates a nearneigh model of the spec-z data and some distances to N-th neighbor that will be used in the estimate stage.

This will create model a dictionary of the nearest neighbor model and params used by estimate

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • usecols ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – columns from sz_data for Neighbor calculation

  • n_neigh ([int] default=10) – number of neighbors to use

  • kalgo ([str] default=kd_tree) – Neighbor algorithm to use

  • kmetric ([str] default=euclidean) – Knn metric to use

  • sz_name ([str] default=redshift) – name of specz column in sz_data

  • szweightcol ([str] default=) – name of sz weight column

  • distance_delta ([float] default=1e-06) – padding for distance calculation

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

bands = ['u', 'g', 'r', 'i', 'z', 'y']
default_usecols = ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']
entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'nz_dir_informer'
name = 'NZDirInformer'
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None
class rail.stages.NZDirSummarizer

Bases: CatEstimator

Quick implementation of a summarizer that creates weights for each input object using sklearn’s NearestNeighbors. Very basic, we can probably create a more sophisticated SOM-based DIR method in the future

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col ([str] default=redshift) – name of redshift column

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • seed ([int] default=87) – random seed

  • usecols ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – columns from sz_data for Neighbor calculation

  • leafsize ([int] default=40) – leaf size for testdata KDTree

  • phot_weightcol ([str] default=) – name of photometry weight, if present

  • n_samples ([int] default=20) – number of bootstrap samples to generate

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

  • single_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

bands = ['u', 'g', 'r', 'i', 'z', 'y']
default_usecols = ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']
entrypoint_function: str | None = 'estimate'
initialize_handle(tag, data, npdf)
interactive_function: str | None = 'nz_dir_summarizer'
join_histograms()
name = 'NZDirSummarizer'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None
class rail.stages.NaiveStackInformer

Bases: PzInformer

Placeholder Informer

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • input (QPHandle (INPUT))

  • truth (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'naive_stack_informer'
model_handle: ModelHandle | None
name = 'NaiveStackInformer'
stage_columns: list[str] | None
class rail.stages.NaiveStackMaskedSummarizer

Bases: NaiveStackSummarizer

Stage NaiveStackMaskedSummarizer

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • seed ([int] default=87) – random seed

  • n_samples ([int] default=1000) – Number of sample distributions to create

  • selected_bin ([int] default=-1) – bin to use

  • input (QPHandle (INPUT))

  • tomography_bins (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

  • single_NZ (QPHandle (OUTPUT))

entrypoint_function: str | None = 'summarize'
inputs = [('input', <class 'rail.core.data.QPHandle'>), ('tomography_bins', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'naive_stack_masked_summarizer'
name = 'NaiveStackMaskedSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]
stage_columns: list[str] | None
summarize(input_data, tomo_bins=None, **kwargs)

Override the Summarizer.summarize() method to take tomo bins as an additional input

Parameters:
  • input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it

  • tomo_bins (TableLike | None, optional) – Tomographic bins file, by default None

Returns:

Ensemble with n(z), and any ancillary data

Return type:

QPHandle

zgrid: ndarray | None
class rail.stages.NaiveStackSummarizer

Bases: PZSummarizer

Summarizer which stacks individual P(z)

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • seed ([int] default=87) – random seed

  • n_samples ([int] default=1000) – Number of sample distributions to create

  • input (QPHandle (INPUT))

  • output (QPHandle (OUTPUT))

  • single_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'summarize'
inputs = [('input', <class 'rail.core.data.QPHandle'>)]
interactive_function: str | None = 'naive_stack_summarizer'
name = 'NaiveStackSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
summarize(input_data, **kwargs)

Summarizer for NaiveStack which returns multiple items

Parameters:

input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it

Returns:

Ensemble with n(z), and any ancillary data Return type depends on output_mode

Return type:

QPHandle | dict[str, QPHandle]

zgrid: ndarray | None
class rail.stages.Noisifier

Bases: RailStage

Base class Noisifier, which adds noise to the input catalog

Noisifier take “input” data in the form of pandas dataframes in Parquet files and provide as “output” another pandas dataframes written to Parquet files.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
name = 'Noisifier'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.ObsCondition

Bases: Noisifier

Photometric errors based on observation conditions

This degrader calculates spatially-varying photometric errors using input survey condition maps. The error is based on the LSSTErrorModel from the PhotErr python package.

mask: str, optional
    Path to the mask covering the survey
    footprint in HEALPIX format. Notice that
    all negative values will be set to zero.
weight: str, optional
    Path to the weights HEALPIX format, used
    to assign sample galaxies to pixels. Default
    is weight="", which uses uniform weighting.
    tot_nVis_flag: bool, optional
    If any map for nVisYr are provided, this flag
    indicates whether the map shows the total number of
    visits in nYrObs (tot_nVis_flag=True), or the average
    number of visits per year (tot_nVis_flag=False). The
    default is set to True.
map_dict: dict, optional
    A dictionary that contains the paths to the
    survey condition maps in HEALPIX format. This dictionary
    uses the same arguments as LSSTErrorModel (from PhotErr).
    The following arguments, if supplied, may contain either
    a single number (as in the case of LSSTErrorModel), or a path:
    [m5, nVisYr, airmass, gamma, msky, theta, km, tvis, EBV]
    For the following keys:
    [m5, nVisYr, gamma, msky, theta, km]
    numbers/paths for specific bands must be passed.
    Example:
    {"m5": {"u": path, ...}, "theta": {"u": path, ...},}
    Other LSSTErrorModel parameters can also be passed
    in this dictionary (e.g. a necessary one may be [nYrObs]
    or the survey condition maps).
    If any argument is not passed, the default value in
    PhotErr's LsstErrorModel is adopted.
Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([int] default=42) – random seed for reproducibility

  • nside ([int] default=128) – nside for the input maps in HEALPIX format.

  • mask ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/creation/degraders/../../examples_data/creation_data/data/survey_conditions/DC2-mask-neg-nside-128.fits) – mask for the input maps in HEALPIX format.

  • weight ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/creation/degraders/../../examples_data/creation_data/data/survey_conditions/DC2-dr6-galcounts-i20-i25.3-nside-128.fits) – weight for assigning pixels to galaxies in HEALPIX format.

  • tot_nVis_flag ([bool] default=True) – flag indicating whether nVisYr is the total or average per year if supplied.

  • map_dict ([dict] default={'m5': {'i': '/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/creation/degraders/../../examples_data/creation_data/data/survey_conditions/minion_1016_dc2_Median_fiveSigmaDepth_i_and_nightlt1825_HEAL.fits'}, 'nYrObs': 5.0}) – dictionary containing the paths to the survey condition maps and/or additional LSSTErrorModel parameters.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

BAND_A_EBV = {'g': 3.64, 'i': 2.06, 'r': 2.7, 'u': 4.81, 'y': 1.31, 'z': 1.58}
STANDARD_BANDS = ['u', 'g', 'r', 'i', 'z', 'y']
__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

apply_galactic_extinction(pixel, pixel_cat)

MW extinction reddening of the magnitudes

Parameters:
  • pixel (int)

  • pixel_cat (DataFrame)

Return type:

DataFrame

assign_pixels(catalog)

assign the pixels to the input catalog check if catalogue contains position information; if so, assign according to ra, dec; else, assign randomly.

Parameters:

catalog (DataFrame)

Return type:

DataFrame

entrypoint_function: str | None = '__call__'
get_pixel_conditions(pixel)

get the map values at given pixel output is a dictionary that only contains the LSSTErrorModel keys

Parameters:

pixel (int)

Return type:

dict

interactive_function: str | None = 'obs_condition'
name = 'ObsCondition'
stage_columns: list[str] | None
class rail.stages.OldEvaluator

Bases: RailStage

Evaluate the performance of a photo-Z estimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • pit_metrics ([str] default=all) – PIT-based metrics to include

  • point_metrics ([str] default=all) – Point-estimate metrics to include

  • hdf5_groupname ([str] default=) – name of hdf5 group for data, if None, then set to ‘’

  • do_cde ([bool] default=True) – Evaluate CDE Metric

  • redshift_col ([str] default=redshift) – name of redshift column

  • input (QPHandle (INPUT))

  • truth (Hdf5Handle (INPUT))

  • output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'evaluate'
evaluate(data, truth, **kwargs)

Evaluate the performance of an estimator

This will attach the input data and truth to this Evaluator (for introspection and provenance tracking). Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes. The run() method will need to register the data that it creates to this Estimator by using self.add_data(‘output’, output_data).

Parameters:
  • data (qp.Ensemble) – The sample to evaluate

  • truth (Any) – Table with the truth information

Returns:

The evaluation metrics

Return type:

Hdf5Handle

inputs = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.Hdf5Handle'>)]
interactive_function: str | None = 'old_evaluator'
name = 'OldEvaluator'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

Run method Evaluate all the metrics and put them into a table .. rubric:: Notes

Get the input data from the data store under this stages ‘input’ tag Get the truth data from the data store under this stages ‘truth’ tag Puts the data into the data store under this stages ‘output’ tag

Return type:

None

stage_columns: list[str] | None
class rail.stages.PZClassifier

Bases: RailStage

The base class for assigning classes (tomographic bins) to per-galaxy PZ estimates.

PZClassifier takes as “input” a qp.Ensemble with per-galaxy PDFs, and provides as “output” tabular data which can be appended to the catalogue.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • input (QPHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

Initialize the PZClassifier.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

classify(input_data, **kwargs)

The main run method for the classifier, should be implemented in the specific subclass.

This will attach the input_data to this PZClassifier (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).

The run() method relies on the _process_chunk() method, which should be implemented by subclasses to perform the actual classification on each chunk of data. The results from each chunk are then combined in the _finalize_run() method. (Alternatively, override run() in a subclass to perform the classification without parallelization.)

Finally, this will return a TableHandle providing access to that output data.

Parameters:

input_data (qp.Ensemble) – Per-galaxy p(z), and any ancilary data associated with it

Returns:

Class assignment for each galaxy, typically in the form of a dictionary with IDs and class labels.

Return type:

TableHandle

entrypoint_function: str | None = 'classify'
inputs = [('input', <class 'rail.core.data.QPHandle'>)]
name = 'PZClassifier'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

Processes the input data in chunks and performs classification.

This method iterates over chunks of the input data, calling the _process_chunk method for each chunk to perform the actual classification.

The _process_chunk method should be implemented by subclasses to define the specific classification logic.

Return type:

None

stage_columns: list[str] | None
class rail.stages.PZFlowEstimator

Bases: CatEstimator

CatEstimator which uses PZFlow

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col (str] (default=redshift))

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • seed ([int] default=0) – seed for flow

  • ref_band (str] (default=mag_i_lsst))

  • column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)

  • err_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns

  • n_error_samples ([int] default=1000) – umber of error samples in marginalization

  • model (FlowHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

entrypoint_function: str | None = 'estimate'
inputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'pz_flow_estimator'
name = 'PZFlowEstimator'
stage_columns: list[str] | None
class rail.stages.PZFlowInformer

Bases: CatInformer

Subclass to train a pzflow-based estimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • seed ([int] default=0) – seed for flow

  • ref_band (str] (default=mag_i_lsst))

  • column_names ([list] default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']) – column names to be used in flow

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • include_mag_errors ([bool] default=False) – Boolean flag on whether to marginalizeover mag errors (NOTE: much slower on CPU!)

  • err_names_dict ([dict] default={'mag_err_u_lsst': 'mag_u_lsst_err', 'mag_err_g_lsst': 'mag_g_lsst_err', 'mag_err_r_lsst': 'mag_r_lsst_err', 'mag_err_i_lsst': 'mag_i_lsst_err', 'mag_err_z_lsst': 'mag_z_lsst_err', 'mag_err_y_lsst': 'mag_y_lsst_err'}) – dictionary to rename error columns

  • n_error_samples ([int] default=1000) – umber of error samples in marginalization

  • soft_sharpness ([int] default=10) – sharpening paremeter for SoftPlus

  • soft_idx_col ([int] default=0) – index column for SoftPlus

  • redshift_col (str] (default=redshift))

  • n_training_epochs ([int] default=50) – number flow training epochs

  • input (TableHandle (INPUT))

  • model (FlowHandle (OUTPUT))

__init__(args, **kwargs)

Constructor, build the CatInformer, then do PZFlow specific setup

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'pz_flow_informer'
name = 'PZFlowInformer'
outputs = [('model', <class 'rail.tools.flow_handle.FlowHandle'>)]
run()

train a flow based on the training data This is mostly based off of the pzflow example notebook

stage_columns: list[str] | None
class rail.stages.PZSummarizer

Bases: RailStage

The base class for classes that go from per-galaxy PZ estimates to ensemble NZ estimates

PZSummarizer take as “input” a qp.Ensemble with per-galaxy PDFs, and provide as “output” a QPEnsemble, with per-ensemble n(z).

entrypoint_function: str | None = 'summarize'
inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.QPHandle'>)]
name = 'PZtoNZSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>)]
stage_columns: list[str] | None
summarize(input_data, **kwargs)

The main run method for the summarization, should be implemented in the specific subclass.

This will attach the input_data to this PZtoNZSummarizer (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Estimator by using self.add_data(‘output’, output_data).

Finally, this will return a QPHandle providing access to that output data.

Parameters:

input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it

Returns:

Ensemble with n(z), and any ancillary data

Return type:

QPHandle

class rail.stages.PhotoErrorModel

Bases: Noisifier

The Base Model for photometric errors.

This is a wrapper around the error model from PhotErr. The parameter docstring below is dynamically added by the installed version of PhotErr:

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'photo_error_model'
name = 'PhotoErrorModel'
reload_pars(args)

This is needed b/c the parameters are dynamically defined, so we have to reload them _after_ then have been defined

set_params(peparams)

Set the photometric error parameters from photerr to the ceci config

stage_columns: list[str] | None
class rail.stages.PointEstHistInformer

Bases: PzInformer

Placeholder Informer

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • input (QPHandle (INPUT))

  • truth (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'point_est_hist_informer'
model_handle: ModelHandle | None
name = 'PointEstHistInformer'
stage_columns: list[str] | None
class rail.stages.PointEstHistMaskedSummarizer

Bases: PointEstHistSummarizer

Summarizer which simply histograms a point estimate

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • seed ([int] default=87) – random seed

  • point_estimate_key ([str] default=zmode) – Which point estimate to use

  • n_samples ([int] default=1000) – Number of sample distributions to return

  • selected_bin ([int] default=-1) – bin to use

  • input (QPHandle (INPUT))

  • tomography_bins (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

  • single_NZ (QPHandle (OUTPUT))

bincents: ndarray | None
entrypoint_function: str | None = 'summarize'
inputs = [('input', <class 'rail.core.data.QPHandle'>), ('tomography_bins', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'point_est_hist_masked_summarizer'
name = 'PointEstHistMaskedSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]
stage_columns: list[str] | None
summarize(input_data, tomo_bins=None, **kwargs)

Override the Summarizer.summarize() method to take tomo bins as an additional input

Parameters:
  • input_data (qp.Ensemble) – Per-galaxy p(z), and any ancilary data associated with it

  • tomo_bins (TableLike | None, optional) – Tomographic bins file, by default None

Returns:

Ensemble with n(z), and any ancilary data

Return type:

QPHandle

zgrid: ndarray | None
class rail.stages.PointEstHistSummarizer

Bases: PZSummarizer

Summarizer which simply histograms a point estimate

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • seed ([int] default=87) – random seed

  • point_estimate_key ([str] default=zmode) – Which point estimate to use

  • n_samples ([int] default=1000) – Number of sample distributions to return

  • input (QPHandle (INPUT))

  • output (QPHandle (OUTPUT))

  • single_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

bincents: ndarray | None
entrypoint_function: str | None = 'summarize'
inputs = [('input', <class 'rail.core.data.QPHandle'>)]
interactive_function: str | None = 'point_est_hist_summarizer'
name = 'PointEstHistSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
zgrid: ndarray | None
class rail.stages.PointToPointBinnedEvaluator

Bases: Evaluator

Evaluate the performance of a photo-z estimator against reference point estimate

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • metrics ([list] default=[]) – The metrics you want to evaluate.

  • exclude_metrics ([list] default=[]) – List of metrics to exclude

  • metric_config ([dict] default={}) – configuration of individual_metrics

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • seed ([float] default=None) – Random seed value to use for reproducible results.

  • force_exact ([bool] default=True) – Force the exact calculation. This will not allow parallelization

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • reference_dictionary_key ([str] default=redshift) – The key in the truth dictionary where the redshift data is stored.

  • point_estimate_key ([str] default=zmode) – The key in the point estimate table.

  • bin_col ([str] default=redshift) – The column metrics are binned by

  • bin_min ([float] default=0.0) – The mininum value of the binning edge

  • bin_max ([float] default=3.0) – The maximum value of the binning edge

  • nbin ([int] default=10) – The mininum value of the binning edge

  • input (QPHandle (INPUT))

  • truth (TableHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

  • summary (Hdf5Handle (OUTPUT))

  • single_distribution_summary (QPDictHandle (OUTPUT))

entrypoint_function: str | None = 'evaluate'
inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'point_to_point_binned_evaluator'
metric_base_class

alias of PointToPointMetric

name = 'PointToPointBinnedEvaluator'
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.PointToPointEvaluator

Bases: Evaluator

Evaluate the performance of a photo-z estimator against reference point estimate

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • metrics ([list] default=[]) – The metrics you want to evaluate.

  • exclude_metrics ([list] default=[]) – List of metrics to exclude

  • metric_config ([dict] default={}) – configuration of individual_metrics

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • seed ([float] default=None) – Random seed value to use for reproducible results.

  • force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • reference_dictionary_key ([str] default=redshift) – The key in the truth dictionary where the redshift data is stored.

  • point_estimate_key ([str] default=zmode) – The key in the point estimate table.

  • input (QPHandle (INPUT))

  • truth (TableHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

  • summary (Hdf5Handle (OUTPUT))

  • single_distribution_summary (QPDictHandle (OUTPUT))

entrypoint_function: str | None = 'evaluate'
inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'point_to_point_evaluator'
metric_base_class

alias of PointToPointMetric

name = 'PointToPointEvaluator'
stage_columns: list[str] | None
class rail.stages.PosteriorCalculator

Bases: RailStage

Base class for object that calculates the posterior distribution of a particular field in a table of photometric data (typically the redshift).

The posteriors will be contained in a qp Ensemble.

__init__(args, **kwargs)

Initialize PosteriorCalculator

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'get_posterior'
get_posterior(input_data, **kwargs)

Return posteriors for the given column.

This is a method for running a Creator in interactive mode. In pipeline mode, the subclass run method will be called by itself.

Parameters:
  • input_data (TableLike) – A table of the galaxies for which posteriors are calculated

  • **kwargs (Any) – Used to update configuration

Returns:

Posterior Estimate

Return type:

QPHandle

Notes

This will put the data argument input this Stages the DataStore using this stages input tag.

This will put the additional functional arguments into this Stages configuration data.

It will then call self.run() and return the QPHandle associated to the output tag.

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]
name = 'PosteriorCalculator'
outputs = [('output', <class 'rail.core.data.QPHandle'>)]
stage_columns: list[str] | None
class rail.stages.PzEstimator

Bases: RailStage, PointEstimationMixin

The base class for making photo-z posterior estimates from other pz inputs

Estimators use a generic “model”, the details of which depends on the sub-class.

Estimators take as “input” a QPEnsemble, with other estimates and provide as “output” a QPEnsemble, with per-object p(z).

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • model (ModelHandle (INPUT))

  • input (QPHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'estimate'
estimate(input_data, **kwargs)

The main interface method for the photo-z estimation

This will attach the input data (defined in inputs as “input”) to this Estimator (for introspection and provenance tracking). Then call the run(), validate(), and finalize() methods.

The run method will call _process_chunk(), which needs to be implemented in the subclass, to process input data in batches. See RandomGaussEstimator for a simple example.

Finally, this will return a QPHandle for access to that output data.

Parameters:

input_data (QPHandle) – A dictionary of all input data

Returns:

Handle providing access to QP ensemble with output data

Return type:

QPHandle

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.QPHandle'>)]
name = 'PzEstimator'
outputs = [('output', <class 'rail.core.data.QPHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.PzInformer

Bases: RailStage

The base class for informing models used to make photo-z data products from existing ensembles of p(z) distributions.

PzInformer can use a generic “model”, the details of which depends on the sub-class. Some summarizer will have associated PzInformer classes, which can be used to inform those models.

(Note, “Inform” is more generic than “Train” as it also applies to algorithms that are template-based rather than machine learning-based.)

PzInformer will produce as output a generic “model”, the details of which depends on the sub-class.

They take as “input” a qp.Ensemble of per-galaxy p(z) data, which is used to “inform” the model.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • input (QPHandle (INPUT))

  • truth (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Informer that can inform models for redshift estimation

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'inform'
inform(training_data='None', truth_data='None', **kwargs)

The main interface method for Informers

This will attach the input_data to this Informer (for introspection and provenance tracking).

Then it will call the run(), validate() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the model that it creates to this Estimator by using self.add_data(‘model’, model).

Finally, this will return a ModelHandle providing access to the trained model.

Parameters:
  • training_data (qp.Ensemble | str, optional) – Per-galaxy p(z), and any ancilary data associated with it, by default “None”

  • truth_data (TableLike | str, optional) – Table with the true redshifts, by default “None”

Returns:

Handle providing access to trained model

Return type:

dict[str, ModelHandle]

inputs = [('input', <class 'rail.core.data.QPHandle'>), ('truth', <class 'rail.core.data.TableHandle'>)]
model_handle: ModelHandle | None
name = 'PzInformer'
outputs = [('model', <class 'rail.core.data.ModelHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.QuantityCut

Bases: Selector

Degrader that applies a cut to the given columns.

Note that if a galaxy fails any of the cuts on any one of its columns, that galaxy is removed from the sample.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • cuts ([dict] (required)) – Cuts to apply

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor.

Performs standard Degrader initialization as well as defining the cuts to be applied.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

cuts: dict | None
entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'quantity_cut'
name = 'QuantityCut'
set_cuts(cuts)

Defines the cuts to be applied.

Parameters:

cuts (dict) – A dictionary of cuts to make on the data

Return type:

None

Notes

The cut keys should be the names of columns you wish to make cuts on.

The cut values should be either: - a number, which is the maximum value. E.g. if the dictionary contains “i”: 25, then values of i > 25 are cut from the sample. - an iterable, which is the range of acceptable values. E.g. if the dictionary contains “redshift”: (1.5, 2.3), then redshifts outside that range are cut from the sample.

stage_columns: list[str] | None
class rail.stages.RandomForestClassifier

Bases: CatClassifier

Classifier that assigns tomographic bins based on random forest method

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • id_name ([str] default=) – Column name for the object ID in the input data, if empty the row index is used as the ID.

  • class_bands ([list] default=['r', 'i', 'z']) – Which bands to use for classification

  • band_map ([dict] default={'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst'}) – column names for the the bands

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'classify'
interactive_function: str | None = 'random_forest_classifier'
model: ModelLike | None
name = 'RandomForestClassifier'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

Apply the classifier to the measured magnitudes

stage_columns: list[str] | None
class rail.stages.RandomForestInformer

Bases: CatInformer

Train the random forest classifier

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • class_bands ([list] default=['r', 'i', 'z']) – Which bands to use for classification

  • band_map ([dict] default={'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst'}) – column names for the the bands

  • redshift_col ([str] default=sz) – Redshift column names

  • bin_edges ([list] default=[0, 0.5, 1.0]) – Binning for training data

  • seed ([int] (required)) – random seed

  • no_assign ([int] default=-99) – Value for no assignment flag

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'random_forest_informer'
name = 'RandomForestInformer'
outputs = [('model', <class 'rail.core.data.ModelHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

stage_columns: list[str] | None
class rail.stages.RandomGaussEstimator

Bases: CatEstimator

Random CatEstimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col ([str] default=redshift) – name of redshift column

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • rand_width ([float] default=0.025) – ad hock width of PDF

  • seed ([int] default=87) – random seed

  • column_name ([str] default=mag_i_lsst) – name of a column that has the correct number of galaxies to find length of

  • input (TableHandle (INPUT))

  • model (ModelHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do CatEstimator specific initialization

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'estimate'
inputs = [('input', <class 'rail.core.data.TableHandle'>), ('model', <class 'rail.core.data.ModelHandle'>)]
interactive_function: str | None = 'random_gauss_estimator'
name = 'RandomGaussEstimator'
stage_columns: list[str] | None
validate()

Validation which checks if the required column names by the stage exist in the data

Return type:

None

class rail.stages.RandomGaussInformer

Bases: CatInformer

Placeholder Informer

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'random_gauss_informer'
name = 'RandomGaussInformer'
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.Reddener

Bases: DustMapBase

Utility stage that does reddening

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • ra_name ([str] default=ra) – Name of the RA column

  • dec_name ([str] default=dec) – Name of the DEC column

  • mag_name ([str] default=mag_{band}_lsst) – Template for the magnitude columns

  • band_a_env (dict] (default={'mag_u_lsst': 4.81, 'mag_g_lsst': 3.64, 'mag_r_lsst': 2.7, 'mag_i_lsst': 2.06, 'mag_z_lsst': 1.58, 'mag_y_lsst': 1.31}))

  • dustmap_name ([str] default=sfd) – Name of the dustmap in question

  • dustmap_dir ([str] (required)) – Directory with dustmaps

  • copy_cols ([list] default=[]) – Additional columns to copy

  • copy_all_cols ([bool] default=False) – Copy all the columns

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'reddener'
name = 'Reddener'
class rail.stages.RomanDeepErrorModel

Bases: PhotoErrorModel

The Roman Deep Error model, defined by peRomanDeepErrorParams and peRomanDeepErrorModel

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'roman_deep_error_model'
name = 'RomanDeepErrorModel'
stage_columns: list[str] | None
class rail.stages.RomanErrorModel

Bases: PhotoErrorModel

The Roman Error model, defined by peRomanErrorParams and peRomanErrorModel

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'roman_error_model'
name = 'RomanErrorModel'
stage_columns: list[str] | None
class rail.stages.RomanMediumErrorModel

Bases: PhotoErrorModel

The Roman Medium Error model, defined by peRomanMediumErrorParams and peRomanMediumErrorModel

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'roman_medium_error_model'
name = 'RomanMediumErrorModel'
stage_columns: list[str] | None
class rail.stages.RomanUltraDeepErrorModel

Bases: PhotoErrorModel

The Roman UltraDeep Error model, defined by peRomanUltraDeepErrorParams and peRomanUltraDeepErrorModel

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'roman_ultra_deep_error_model'
name = 'RomanUltraDeepErrorModel'
stage_columns: list[str] | None
class rail.stages.RomanWideErrorModel

Bases: PhotoErrorModel

The Roman WideError model, defined by peRomanWideErrorParams and peRomanWideErrorModel

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'roman_wide_error_model'
name = 'RomanWideErrorModel'
stage_columns: list[str] | None
class rail.stages.RowSelector

Bases: RailStage

Utility Stage that sub-selects rows from a table by index

  1. This operates on pandas dataframs in parquet files.

2. In short, this does: output_data = input_data[self.config.start_row:self.config.stop_row]

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • start_row ([int] (required)) – starting row number

  • stop_row ([int] (required)) – Stoppig row number

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'row_selector'
name = 'RowSelector'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

class rail.stages.SOMSpecSelector

Bases: Selector

Class that creates a specz sample by training a SOM on data with spec-z, classifying all galaxies from a larger sample via the SOM, then selecting the same number of galaxies in each SOM cell as there are in the specz sample. If fewer galaxies are available in the large sample for a cell, it just takes as many as possible, so you can still mismatch the distribution numbers, i.e. if you have a lot of bright galaxies with speczs from a really wide survey like SDSS and the second dataset does not have the same areal coverage, then there may not be enough bright objects in the second dataset to select, so you will end up with fewer.

For the columns used to construct the SOM, there are two sets of columns, noncolor_cols is a config option where you supply a list of columns that will be used directly in the SOM, e.g. redshift, i-magnitude, etc… color_cols, on the other hand, is a config parameter where the user supplies an ordered list of columns that will be differenced before being used as SOM inputs, e.g. if you supply [‘u’, ‘g’,’r’] then a function in the code will compute u-g and g-r and use those in SOM construction. The code combines the noncolor_cols and color_cols features and all are used in construction of the SOM.

As this degrader inherits from Selector, it simply computes a mask, the Selector parent class code will perform the masking, and will return the final dataset that mimics the input reference sample.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • nondetect_val (float] (default=99.0))

  • noncolor_cols ([list] default=['i', 'redshift']) – data columns used for SOM, can be a single band ifyou will also be using colordata in ‘color_cols’, or can be as many as you want

  • noncolor_nondet ([list] default=[28.62, -1.0]) – list of nondetect replacement values for the non-color cols

  • color_cols ([list] default=['u', 'g', 'r', 'i', 'z', 'y']) – columns that will be differenced to make colors. This will be done in order, so put in increasing WL order

  • color_nondet ([list] default=[27.79, 29.04, 29.06, 28.62, 27.98, 27.05]) – list of nondetect replacement vals for color columns

  • som_size ([list] default=[32, 32]) – tuple containing the size (x, y) of the SOM

  • n_epochs ([int] default=10) – number of training epochs.

  • spec_data (TableHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

entrypoint_function: str | None = '__call__'
inputs = [('spec_data', <class 'rail.core.data.TableHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'som_spec_selector'
make_data_selection(df)

make the data to train the som or input to som

name = 'SOMSpecSelector'
stage_columns: list[str] | None
class rail.stages.SOMocluInformer

Bases: CatInformer

Summarizer that uses a SOM to construct a weighted sum of spec-z objects in the same SOM cell as each photometric galaxy in order to estimate the overall N(z). This is very related to the NZDir estimator, though that estimator actually reverses this process and looks for photometric neighbors around each spectroscopic galaxy, which can lead to problems if there are photometric galaxies with no nearby spec-z objects (NZDir is not aware that such objects exist and thus can hid biases).

We apply somoclu package (https://somoclu.readthedocs.io/) to train the SOM.

Part of the SOM estimator will be a check for cells which contain photometric objects but do not contain any corresponding training/spec-z objects, those unmatched objects will be flagged for possible removal from the input sample. The inform stage will simply construct a 2D grid SOM using somoclu from a large sample of input photometric data and save this as an output. This may be a computationally intensive stage, though it will hopefully be run once and used by the estimate/summarize stage many times without needing to be re-run.

We can make the SOM either with all colors, or one magnitude and N colors, or an arbitrary set of columns. The code includes a flag column_usage to set usage, If set to “colors” it will take the difference of each adjacen pair of columns in bands as the colors. If set to magandcolors it will use these colors plus one magnitude as specified by ref_band. If set to columns then it will take as inputs all of the columns specified by bands (they can be magnitudes, colors, or any other input specified by the user). NOTE: any custom bands parameters must have an accompanying nondetect_val dictionary that will replace nondetections with the nondetect_val values!

This creates a pickle file containing the somoclu SOM object that will be used by the estimation/summarization stage

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname (str] (default=photometry))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • err_bands (list] (default=['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • redshift_col (str] (default=redshift))

  • column_usage ([str] default=magandcolors) – switch for how SOM uses columns, valid values are ‘colors’,’magandcolors’, and ‘mags’

  • seed ([int] default=0) – Random number seed

  • n_rows ([int] default=31) – number of cells in SOM y dimension

  • n_columns ([int] default=31) – number of cells in SOM x dimension

  • grid_type ([str] default=rectangular) – Optional parameter to specify the grid form of the nodes:* ‘rectangular’: rectangular neurons (default)* ‘hexagonal’: hexagonal neurons

  • n_epochs ([int] default=10) – number of training epochs.

  • initialization ([str] default=pca) – method of initializing the SOM:* ‘pca’: principal componant analysis (default)* ‘random’ randomly initialize the SOM

  • maptype ([str] default=planar) – Optional parameter to specify the map topology:* ‘planar’: Planar map (default)* ‘toroid’: Toroid map

  • std_coeff ([float] default=1.5) – Optional parameter to set the coefficient in the Gaussianneighborhood function exp(-||x-y||^2/(2*(coeff*radius)^2))Default: 1.5

  • som_learning_rate ([float] default=0.5) – Initial SOM learning rate (scale0 param in Somoclu)

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do Informer specific initialization

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'somoclu_informer'
name = 'SOMocluInformer'
run()

Build a SOM from photometric data NOT spectroscopic data!

stage_columns: list[str] | None
class rail.stages.SOMocluSummarizer

Bases: SZPZSummarizer

Quick implementation of a SOM-based summarizer. It will group a pre-trained SOM into hierarchical clusters and assign a galaxy sample into SOM cells and clusters. Then it constructs an N(z) estimation via a weighted sum of the empirical N(z) consisting of the normalized histogram of spec-z values contained in the same SOM cluster as each photometric galaxy. There are some general guidelines to choosing the geometry and number of total cells in the SOM. This paper: http://www.giscience2010.org/pdfs/paper_230.pdf recommends 5*sqrt(num rows * num data columns) as a rough guideline. Some authors state that a SOM with one dimension roughly twice as long as the other are better, while others find that square SOMs with equal X and Y dimensions are best, the user can set the dimensions using the n_columns and n_rows parameters. For more discussion on SOMs and photo-z calibration, see the KiDS paper on the topic: http://arxiv.org/abs/1909.09632 particularly the appendices. Note that several parameters are stored in the model file, e.g. the columns used. This ensures that the same columns used in constructing the SOM are used when finding the winning SOM cell with the test data. Two additional files are also written out: cellid_output outputs the ‘winning’ SOM cell for each photometric galaxy, in both raveled and 2D SOM cell coordinates. If the objectID or galaxy_id is present they will also be included in this file, if not the coordinates will be written in the same order in which the data is read in. uncovered_cell_file outputs the raveled cell IDs of cells that contain photometric galaxies but no corresponding spectroscopic objects, these objects should be removed from the sample as they cannot be accounted for properly in the summarizer. Some iteration on data cuts may be necessary to remove/mitigate these ‘uncovered’ objects.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • hdf5_groupname (str] (default=photometry))

  • redshift_col (str] (default=redshift))

  • spec_groupname ([str] default=photometry) – name of hdf5 group for spec data, if None, then set to ‘’

  • n_clusters ([int] default=-1) – The number of hierarchical clusters of SOM cells. If not provided, the SOM cells will not be clustered.

  • objid_name ([str] default=) – name of ID column, if present will be written to cellid_output

  • seed ([int] default=12345) – random seed

  • phot_weightcol ([str] default=) – name of photometry weight, if present

  • spec_weightcol ([str] default=) – name of specz weight col, if present

  • som_split_size ([int] default=200) – the size of data chunks when calculating the distances between the codebook and data

  • n_samples ([int] default=20) – number of bootstrap samples to generate

  • useful_clusters ([list] default=[]) – the cluster indices that are used for calibration. If not given, then all the clusters containing spec sample are used.

  • input (TableHandle (INPUT))

  • spec_input (TableHandle (INPUT))

  • model (ModelHandle (INPUT))

  • output (QPHandle (OUTPUT))

  • single_NZ (QPHandle (OUTPUT))

  • cellid_output (Hdf5Handle (OUTPUT))

  • uncovered_cluster_file (TableHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator that can sample galaxy data.

entrypoint_function: str | None = 'summarize'
get_som_coordinates(data, weight_col)

Find the bmus coordinate of each item in the data.

interactive_function: str | None = 'somoclu_summarizer'
name = 'SOMocluSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>), ('cellid_output', <class 'rail.core.data.Hdf5Handle'>), ('uncovered_cluster_file', <class 'rail.core.data.TableHandle'>)]
replace_non_detections(data)

Replace non-detected data with magnitude limits.

run()

Run the stage and return the execution status.

Subclasses must implemented this method.

set_weight_column(data, weight_col)

Assign weight vecs if present, else set all to 1.0. weight_col: column name of weights.

stage_columns: list[str] | None
class rail.stages.SZPZSummarizer

Bases: RailStage

The base class for classes that use two sets of data: a photometry sample with spec-z values, and a photometry sample with unknown redshifts, e.g. minisom_som and outputs a QP Ensemble with bootstrap realization of the N(z) distribution

__init__(args, **kwargs)

Initialize Estimator that can sample galaxy data.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'summarize'
inputs = [('input', <class 'rail.core.data.TableHandle'>), ('spec_input', <class 'rail.core.data.TableHandle'>), ('model', <class 'rail.core.data.ModelHandle'>)]
name = 'SZPZtoNZSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>)]
stage_columns: list[str] | None
summarize(input_data, spec_data, **kwargs)

The main run method for the summarization, should be implemented in the specific subclass.

This will attach the input_data to this SZandPhottoNZSummarizer (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Estimator by using self.add_data(‘output’, output_data).

Finally, this will return a QPHandle providing access to that output data.

Parameters:
  • input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it

  • spec_data (np.ndarray) – Spectroscopic data

Returns:

Ensemble with n(z), and any ancillary data

Return type:

qp.Ensemble

class rail.stages.Selector

Bases: RailStage

Base class Selector, which makes selection to the catalog

Selector take “input” data in the form of pandas dataframes in Parquet files and provide as “output” another pandas dataframes written to Parquet files.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([type not specified] default=None) – Set to an int to force reproducible results.

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
name = 'Selector'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.SingleEvaluator

Bases: Evaluator

Evaluate the performance of a photo-Z estimator

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • metrics ([list] default=[]) – The metrics you want to evaluate.

  • exclude_metrics ([list] default=[]) – List of metrics to exclude

  • metric_config ([dict] default={}) – configuration of individual_metrics

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • seed ([float] default=None) – Random seed value to use for reproducible results.

  • force_exact ([bool] default=False) – Force the exact calculation. This will not allow parallelization

  • point_estimates ([list] default=[]) – List of point estimates to use

  • truth_point_estimates ([list] default=[]) – List of true point values to use

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • input (QPOrTableHandle (INPUT))

  • truth (QPOrTableHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

  • summary (Hdf5Handle (OUTPUT))

  • single_distribution_summary (QPDictHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Evaluator

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'evaluate'
inputs: list[tuple[str, type[DataHandle]]] = [('input', <class 'rail.core.data.QPOrTableHandle'>), ('truth', <class 'rail.core.data.QPOrTableHandle'>)]
interactive_function: str | None = 'single_evaluator'
metric_base_class

alias of BaseMetric

name = 'SingleEvaluator'
run()

Run method

Evaluate all the metrics and put them into a table

Notes

Get the input data from the data store under this stages ‘input’ tag Get the truth data from the data store under this stages ‘truth’ tag Puts the data into the data store under this stages ‘output’ tag

Return type:

None

stage_columns: list[str] | None
class rail.stages.SklNeurNetEstimator

Bases: CatEstimator

Subclass to implement a simple point estimate Neural Net photoz rather than actually predict PDF, for now just predict point zb and then put an error of width*(1+zb). We’ll do a “real” NN photo-z later.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col ([str] default=redshift) – name of redshift column

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • width ([float] default=0.05) – The ad hoc base width of the PDFs

  • ref_band (str] (default=mag_i_lsst))

  • nondetect_val (float] (default=99.0))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do CatEstimator specific initialization

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'skl_neur_net_estimator'
name = 'SklNeurNetEstimator'
stage_columns: list[str] | None
class rail.stages.SklNeurNetInformer

Bases: CatInformer

Subclass to train a simple point estimate Neural Net photoz rather than actually predict PDF, for now just predict point zb and then put an error of width*(1+zb). We’ll do a “real” NN photo-z later.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname (str] (default=photometry))

  • zmin (float] (default=0.0))

  • zmax (float] (default=3.0))

  • nzbins (int] (default=301))

  • nondetect_val (float] (default=99.0))

  • mag_limits (dict] (default={'mag_u_lsst': 27.79, 'mag_g_lsst': 29.04, 'mag_r_lsst': 29.06, 'mag_i_lsst': 28.62, 'mag_z_lsst': 27.98, 'mag_y_lsst': 27.05}))

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • ref_band (str] (default=mag_i_lsst))

  • redshift_col (str] (default=redshift))

  • width ([float] default=0.05) – The ad hoc base width of the PDFs

  • max_iter ([int] default=500) – max number of iterations while training the neural net. Too low a value will cause an error to be printed (though the code will still work, justnot optimally)

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do CatInformer specific initialization

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'skl_neur_net_informer'
name = 'SklNeurNetInformer'
run()

Train the NN model

stage_columns: list[str] | None
class rail.stages.SpecSelection

Bases: Selector

The super class of spectroscopic selections.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

downsampling_n_tot()

Randomly sample down the objects to a given number of data objects.

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection'
invalid_cut(data)

Removes entries in the data that have invalid magnitude values (NaN or nondetect_val).

name = 'SpecSelection'
selection(data)

Selection functions.

This should be overwritten by the subclasses corresponding to different spec selections.

stage_columns: list[str] | None
validate_colnames(data)

Validate the column names of data table to make sure they have necessary information for each selection.

Parameters:

colnames (list of str) – A list of column names

class rail.stages.SpecSelection_BOSS

Bases: SpecSelection

The class of spectroscopic selections with BOSS.

BOSS selection function is based on http://www.sdss3.org/dr9/algorithms/boss_galaxy_ts.php

The selection has changed slightly compared to Dawson+13.

BOSS covers an area of 9100 deg^2 with 893,319 galaxies.

For BOSS selection, the data should at least include gri bands.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_BOSS'
name = 'SpecSelection_BOSS'
selection(data)

The BOSS selection function.

stage_columns: list[str] | None
class rail.stages.SpecSelection_DEEP2

Bases: SpecSelection

The class of spectroscopic selections with DEEP2.

DEEP2 has a sky coverage of 2.8 deg^2 with ~53000 spectra.

For DEEP2, one needs R band magnitude, B-R/R-I colors–which are not available for the time being, so we use LSST gri bands now. When the conversion degrader is ready, this subclass will be updated accordingly.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_DEEP2'
name = 'SpecSelection_DEEP2'
photometryCut(data)

Applies DEEP2 photometric cut based on Newman+13.

This modified selection gives the best match to the data n(z) with its cut at z~0.75 and the B-R/R-I distribution (Newman+13, Fig. 12).

Notes

We cannot apply the surface brightness cut and do not apply the Gaussian weighted sampling near the original colour cuts.

selection(data)

DEEP2 selection function.

speczSuccess(data)

Spec-z success rate as function of r_AB for Q>=3 read of Figure 13 in Newman+13 for DEEP2 fields 2-4. Values are binned in steps of 0.2 mag with the first and last bin centered on 19 and 24.

stage_columns: list[str] | None
class rail.stages.SpecSelection_DEEP2_LSST

Bases: SpecSelection

The class of spectroscopic selections with DEEP2.

Approximate Rubin->CFHT12K transforms based off of CWWSB SED colors

B = g + 0.35 * (g-r) R = r - 0.3 * (r-i) I = i - 0.5 * (r-i)

transform the cuts accordingly

Also, original has B-R < 0.5 modify to B-R < 0.33 to exclude a few more low-z galaxies leave speczSuccess unchanged from original implementation

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_DEEP2_LSST'
name = 'SpecSelection_DEEP2_LSST'
photometryCut(data)

Applies DEEP2 photometric cut based on Newman+13.

This modified selection gives the best match to the data n(z) with its cut at z~0.75 and the B-R/R-I distribution (Newman+13, Fig. 12).

Notes

We cannot apply the surface brightness cut and do not apply the Gaussian weighted sampling near the original colour cuts.

selection(data)

DEEP2 selection function.

speczSuccess(data)

Spec-z success rate as function of r_AB for Q>=3 read of Figure 13 in Newman+13 for DEEP2 fields 2-4. Values are binned in steps of 0.2 mag with the first and last bin centered on 19 and 24.

stage_columns: list[str] | None
class rail.stages.SpecSelection_DESI_BGS

Bases: SpecSelection

The class of spectroscopic selections with DESI BGS .

Implements a minimal DESI Bright Galaxy Survey (BGS) selection using:
  • r < 19.5

Required bands in data (via config.colnames): r

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_DESI_BGS'
name = 'SpecSelection_DESI_BGS'
selection(data)

The DESI BGS selection function (simplified cut).

stage_columns: list[str] | None
class rail.stages.SpecSelection_DESI_ELG_LOP

Bases: SpecSelection

The class of spectroscopic selections with DESI ELG LOP.

Implements the simplified DESI ELG_LOP photometric selection using:
  • (g > 20) AND (gfib < 24.1)

  • 0.15 < (r − z)

  • (g − r) < 0.5 × (r − z) + 0.1

  • (g − r) < −1.2 × (r − z) + 1.3

All of the above are combined with AND.

Required bands in data (via config.colnames): g, r, z, gfib

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_ELG_LOP'
name = 'SpecSelection_DESI_ELG_LOP'
selection(data)

The DESI ELG_LOP selection function.

stage_columns: list[str] | None
class rail.stages.SpecSelection_DESI_LRG

Bases: SpecSelection

The class of spectroscopic selections with DESI LRG (simplified).

This implements a simplified DESI LRG photometric selection using:
  • zfiber < 21.60 (here approximated with z)

  • z − W1 > 0.8 × (r − z) − 0.6

  • (g − W1 > 2.9) OR (r − W1 > 1.8)

  • [ ((r − W1 > 1.8 × (W1 − 17.14)) AND (r − W1 > W1 − 16.33)) OR (r − W1 > 3.3) ]

All of the above are combined with AND.

Required bands in data (via config.colnames): g, r, z, W1

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'W1': 'W1', 'redshift': 'redshift'}) – a dictionary that includes necessary columns (magnitudes, colors and redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_DESI_LRG'
name = 'SpecSelection_DESI_LRG'
selection(data)

The DESI LRG selection function (simplified).

stage_columns: list[str] | None
class rail.stages.SpecSelection_DESI_Phy

Bases: Selector

DESI tracer selector based on pre-computed redshift-dependent thresholds.

Applies a selection to a simulation catalog by comparing a physical parameter column against a threshold that varies with redshift. The threshold table is provided externally (e.g. from abundance matching) and is not computed by this stage.

All supported DESI tracer types (bgs, lrg, elg) select objects whose physical parameter value is above the redshift-interpolated threshold.

Inputs

inputPqHandle

Simulation catalog containing the physical parameter column and a redshift column.

threshold_tableTableHandle
Table with two columns:
  • z : redshift bin centers

  • thresh: threshold values at those redshift centers

Output

outputPqHandle

Catalog after applying the DESI selection mask.

param output_mode:

What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

type output_mode:

[str] default=default

param drop_rows:

Drop selected rows from output table

type drop_rows:

[bool] default=True

param seed:

Set to an int to force reproducible results.

type seed:

[type not specified] default=None

param desi_type:

DESI tracer type: ‘bgs’, ‘lrg’, or ‘elg’

type desi_type:

[str] default=lrg

param threshold_col:

Column in the input catalog used for threshold-based selection (e.g. ‘log_peak_sub_halo_mass’ for bgs/lrg, ‘log_sfr’ for elg)

type threshold_col:

[str] default=None

param redshift_col:

Column name for redshift in the input catalog

type redshift_col:

[str] default=redshift

param threshold_table:

Filename of the threshold file

type threshold_table:

[str] default=None

param input:

type input:

PqHandle (INPUT)

param output:

type output:

PqHandle (OUTPUT)

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'spec_selection_desi_phy'
name = 'SpecSelection_DESI_Phy'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
stage_columns: list[str] | None
class rail.stages.SpecSelection_GAMA

Bases: SpecSelection

The class of spectroscopic selections with GAMA.

The GAMA survey covers an area of 286 deg^2, with ~238000 objects.

The necessary column is r band.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_GAMA'
name = 'SpecSelection_GAMA'
selection(data)

GAMA selection function.

stage_columns: list[str] | None
class rail.stages.SpecSelection_HSC

Bases: SpecSelection

The class of spectroscopic selections with HSC.

For HSC, the data should at least include giz bands and redshift.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_HSC'
name = 'SpecSelection_HSC'
photometryCut(data)

HSC galaxies were binned in color magnitude space with i-band mag from -2 to 6 and g-z color from 13 to 26.

selection(data)

Selection functions.

This should be overwritten by the subclasses corresponding to different spec selections.

speczSuccess(data)

HSC galaxies were binned in color magnitude space with i-band mag from -2 to 6 and g-z color from 13 to 26 (200 bins in each direction). The ratio of galaxies with spectroscopic redshifts (training galaxies) to galaxies with only photometry in HSC wide field (application galaxies) was computed for each pixel. We divide the data into the same pixels and randomly select galaxies into the training sample based on the HSC ratios.

stage_columns: list[str] | None
class rail.stages.SpecSelection_VVDSf02

Bases: SpecSelection

The class of spectroscopic selections with VVDSf02.

It covers an area of 0.5 deg^2 with ~10000 sources.

Necessary columns are i band magnitude and redshift.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_VVDSf02'
name = 'SpecSelection_VVDSf02'
photometryCut(data)

Photometric cut of VVDS 2h-field based on LeFèvre+05.

Notes

The oversight of 1.0 magnitudes on the bright end misses 0.2% of galaxies.

selection(data)

Selection functions.

This should be overwritten by the subclasses corresponding to different spec selections.

speczSuccess(data)

Success rate of VVDS 2h-field.

Notes

We use a redshift-based and I-band based success rate independently here since we do not know their correlation, which makes the success rate worse than in reality.

Spec-z success rate as function of i_AB read of Figure 16 in LeFevre+05 for the VVDS 2h field. Values are binned in steps of 0.5 mag with the first starting at 17 and the last bin ending at 24.

stage_columns: list[str] | None
class rail.stages.SpecSelection_zCOSMOS

Bases: SpecSelection

The class of spectroscopic selections with zCOSMOS.

It covers an area of 1.7 deg^2 with ~20000 galaxies.

For zCOSMOS, the data should at least include i band and redshift.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • drop_rows ([bool] default=True) – Drop selected rows from output table

  • seed ([int] default=42) – random seed for reproducibility

  • n_tot ([int] default=10000) – Number of selected sources

  • nondetect_val (float] (default=99.0))

  • downsample ([bool] default=True) – If true, downsample the selected sources into a total number of n_tot

  • success_rate_dir ([str] default=/home/docs/checkouts/readthedocs.org/user_builds/rail-hub/conda/latest/lib/python3.14/site-packages/rail/examples_data/creation_data/data/success_rate_data) – The path to the directory containing success rate files.

  • percentile_cut ([int] default=100) – cut redshifts above this percentile

  • colnames ([dict] default={'u': 'mag_u_lsst', 'g': 'mag_g_lsst', 'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst', 'y': 'mag_y_lsst', 'redshift': 'redshift'}) –

    a dictionary that includes necessary columns (magnitudes, colors and

    redshift) for selection. For magnitudes, the keys are ugrizy; for colors, the keys are, for example, gr standing for g-r; for redshift, the key is ‘redshift’

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'spec_selection_zCOSMOS'
name = 'SpecSelection_zCOSMOS'
photometryCut(data)

Photometry cut for zCOSMOS based on Lilly+09.

Updates the internal state.

NOTE: This only includes zCOSMOS bright.

selection(data)

Selection functions.

This should be overwritten by the subclasses corresponding to different spec selections.

speczSuccess(data)

Spec-z success rate as function of redshift (x) and I_AB (y) read of Figure 3 in Lilly+09 for zCOSMOS bright sample.

stage_columns: list[str] | None
class rail.stages.TableConverter

Bases: RailStage

Utility stage that converts tables from one format to anothe

FIXME, this is hardwired to convert parquet tables to Hdf5Tables. It would be nice to have more options here.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • output_format ([str] (required)) – Format of output table

  • input (PqHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'table_converter'
name = 'TableConverter'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

class rail.stages.TrainZEstimator

Bases: CatEstimator

CatEstimator which returns a global PDF for all galaxies

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • id_col ([str] default=object_id) – name of the object ID column

  • redshift_col ([str] default=redshift) – name of redshift column

  • calc_summary_stats ([bool] default=False) – Compute summary statistics

  • calculated_point_estimates ([list] default=[]) – List of strings defining which point estimates to automatically calculate using qp.Ensemble.Options include, ‘mean’, ‘mode’, ‘median’.

  • recompute_point_estimates ([bool] default=False) – Force recomputation of point estimates

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (QPHandle (OUTPUT))

__init__(args, **kwargs)

Initialize Estimator

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'estimate'
interactive_function: str | None = 'train_z_estimator'
name = 'TrainZEstimator'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs (Any) – Should include ‘model’, see notes

Return type:

None

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

Parameters:

kwargs (Any)

stage_columns: list[str] | None
train_pdf: np.ndarray | None
zgrid: np.ndarray | None
zmode: np.ndarray | None
class rail.stages.TrainZInformer

Bases: CatInformer

Train an Estimator which returns a global PDF for all galaxies

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • redshift_col ([str] default=redshift) – name of redshift column

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'train_z_informer'
name = 'TrainZInformer'
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
validate()

Validation which checks if the required column names by the stage exist in the data

Return type:

None

class rail.stages.TrueNZHistogrammer

Bases: RailStage

Summarizer-like stage which simply histograms the true redshift

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • redshift_col ([str] default=redshift) – name of redshift column

  • selected_bin ([int] default=-1) – Which tomography bin to consider

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • input (TableHandle (INPUT))

  • tomography_bins (TableHandle (INPUT))

  • true_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

bincents: ndarray | None
entrypoint_function: str | None = 'histogram'
histogram(catalog, tomo_bins, **kwargs)

The main interface method for TrueNZHistogrammer.

Creates histogram of N of Z_true.

This will attach the sample to this Stage (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Estimator by using self.add_data('output', output_data).

Finally, this will return a PqHandle providing access to that output data.

Parameters:
  • catalog (TableLike) – The sample with the true NZ column

  • tomo_bins (TableLike) – Tomographic bin assignemnets

Returns:

A handle giving access to a the histogram in QP format

Return type:

PqHandle

inputs = [('input', <class 'rail.core.data.TableHandle'>), ('tomography_bins', <class 'rail.core.data.TableHandle'>)]
interactive_function: str | None = 'true_nz_histogrammer'
name = 'TrueNZHistogrammer'
outputs = [('true_NZ', <class 'rail.core.data.QPHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
zgrid: ndarray | None
class rail.stages.UniformBinningClassifier

Bases: PZClassifier

Classifier that simply assigns tomographic bins based on a point estimate according to SRD.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • object_id_col ([str] default=) – name of object id column

  • point_estimate_key ([str] default=zmode) – Which point estimate to use

  • zbin_edges ([list] default=[]) – The tomographic redshift bin edges.If this is given (contains two or more entries), all settings below will be ignored.

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • n_tom_bins ([int] default=5) – Number of tomographic bins

  • no_assign ([int] default=-99) – Value for no assignment flag

  • input (QPHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'classify'
interactive_function: str | None = 'uniform_binning_classifier'
name = 'UniformBinningClassifier'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
stage_columns: list[str] | None
class rail.stages.UnrecBlModel

Bases: Degrader

Model for Creating Unrecognized Blends.

Finding objects nearby each other. Merge them into one blended Use Friends of Friends for matching. May implement shape matching in the future. Take avergaged Ra and Dec for blended source, and sum up fluxes in each band. May implement merged shapes in the future.

Requires gcc, which depending on your installation, may be difficult for the caller (FoFCatalogMatching dependency fast3tree) to find. Conda-installed gcc seems to fix this.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • seed ([int] default=12345) – Random number seed

  • ra_label ([str] default=ra) – ra column name

  • dec_label ([str] default=dec) – dec column name

  • linking_lengths ([float] default=1.0) – linking_lengths for FoF matching

  • bands (list] (default=['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst']))

  • zp_dict ([dict] default={'u': 12.65, 'g': 14.69, 'r': 14.56, 'i': 14.38, 'z': 13.99, 'y': 13.02}) – magnitude zeropoints dictionary

  • ref_band (str] (default=mag_i_lsst))

  • redshift_col (str] (default=redshift))

  • match_size ([bool] default=False) – consider object size for finding blends

  • match_shape ([bool] default=False) – consider object shape for finding blends

  • obj_size ([str] default=obj_size) – object size column name

  • a ([str] default=semi_major) – semi major axis column name

  • b ([str] default=semi_minor) – semi minor axis column name

  • theta ([str] default=orientation) – orientation angle column name

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

  • compInd (PqHandle (OUTPUT))

blend_info_cols = ['group_id', 'n_obj', 'brightest_flux', 'total_flux', 'z_brightest', 'z_weighted', 'z_mean', 'z_stdev']
entrypoint_function: str | None = '__call__'
interactive_function: str | None = 'unrec_bl_model'
name = 'UnrecBlModel'
outputs = [('output', <class 'rail.core.data.PqHandle'>), ('compInd', <class 'rail.core.data.PqHandle'>)]
run()

Return pandas DataFrame with blending errors.

stage_columns: list[str] | None
class rail.stages.VarInfStackInformer

Bases: PzInformer

Placeholder Informer

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • input (QPHandle (INPUT))

  • truth (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'var_inf_stack_informer'
model_handle: ModelHandle | None
name = 'VarInfStackInformer'
stage_columns: list[str] | None
class rail.stages.VarInfStackSummarizer

Bases: PZSummarizer

Variational inference summarizer based on notebook created by Markus Rau The summzarizer is appropriate for the likelihoods returned by template-based codes, for which the NaiveSummarizer are not appropriate.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • zmin ([float] default=0.0) – The minimum redshift of the z grid or sample

  • zmax ([float] default=3.0) – The maximum redshift of the z grid or sample

  • nzbins ([int] default=301) – The number of gridpoints in the z grid

  • seed ([int] default=87) – random seed

  • n_iter ([int] default=100) – The number of iterations in the variational inference

  • n_samples ([int] default=500) – The number of samples used in dirichlet uncertainty

  • input (QPHandle (INPUT))

  • output (QPHandle (OUTPUT))

  • single_NZ (QPHandle (OUTPUT))

__init__(args, **kwargs)

Constructor: Do RailStage specific initialization

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

entrypoint_function: str | None = 'summarize'
inputs = [('input', <class 'rail.core.data.QPHandle'>)]
interactive_function: str | None = 'var_inf_stack_summarizer'
name = 'VarInfStackSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
summarize(input_data, **kwargs)

Summarizer for VarInfStack which returns multiple items

Parameters:

input_data (qp.Ensemble) – Per-galaxy p(z), and any ancillary data associated with it

Returns:

Ensemble with n(z), and any ancillary data Return type depends on output_mode

Return type:

QPHandle | dict[str, QPHandle]

zgrid: ndarray | None
class rail.stages.YawAutoCorrelate

Bases: YawRailStage

Wrapper stage for yaw.autocorrelate to compute a sample’s angular autocorrelation amplitude.

Generally used for the reference sample to compute an estimate for its galaxy sample as a function of redshift. Data is provided as a single cache directory that must have redshifts and randoms with redshift attached.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • rmin ([float] (required)) – Single or sequence of lower scale limits in given ‘unit’.

  • rmax ([float] (required)) – Single or sequence of upper scale limits in given ‘unit’.

  • unit ([str] default=kpc) – The unit of the lower and upper scale limits.

  • rweight ([float] default=None) – Power-law exponent used to weight pairs by their separation.

  • resolution ([int] default=None) – Number of radial logarithmic bin used to approximate the weighting by separation.

  • zmin ([float] default=None) – Lowest redshift bin edge to generate (alternatively use ‘edges’).

  • zmax ([float] default=None) – Highest redshift bin edge to generate (alternatively use ‘edges’).

  • num_bins ([int] default=30) – Number of redshift bins to generate between ‘zmin’ and ‘zmax’.

  • method ([str] default=linear) – Method used to compute the spacing of bin edges.

  • edges ([float] default=None) – Use these custom bin edges instead of generating them.

  • closed ([str] default=right) – String indicating the side of the bin intervals that are closed.

  • max_workers ([int] default=None) – configure a custom maximum number of parallel workers to use

  • verbose ([str] default=info) – lowest log level emitted by yet_another_wizz

  • sample (YawCacheHandle (INPUT))

  • output (YawCorrFuncHandle (OUTPUT))

algo_parameters: set[str] = {'closed', 'edges', 'max_workers', 'method', 'num_bins', 'resolution', 'rmax', 'rmin', 'rweight', 'unit', 'zmax', 'zmin'}

Lists the names of all algorithm-specific parameters that were added when subclassing.

correlate(sample, **kwargs)

Measure the angular autocorrelation amplitude in bins of redshift.

Parameters:

sample (YawCache) – Input cache which must have randoms attached and redshifts for both data set and randoms.

Returns:

A handle for the yaw.CorrFunc instance that holds the pair counts.

Return type:

YawCorrFuncHandle

entrypoint_function: str | None = 'correlate'
inputs = [('sample', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]
interactive_function: str | None = 'yaw_auto_correlate'
name = 'YawAutoCorrelate'
outputs = [('output', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.YawCacheCreate

Bases: YawRailStage

Create a new cache directory to hold a data set and optionally its matching random catalog.

Both input data sets are split into consistent spatial patches that are required by yet_another_wizz for correlation function covariance estimates. Each patch is stored separately for efficient access.

The cache can be constructed from input files or tabular data in memory. Column names for sky coordinates are required, redshifts and per-object weights are optional. One out of three patch create methods must be specified:

  1. Splitting the data into predefined patches (from ASCII file or an existing cache instance, linked as optional stage input).

  2. Splitting the data based on a column with patch indices.

  3. Generating approximately equal size patches using k-means clustering of objects positions (preferably randoms if provided).

Note: The cache directory must be deleted manually when it is no longer needed. (The reference sample cache may be reused when operating on tomographic bins.)

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • path ([str] (required)) – path to cache directory, must not exist

  • overwrite ([bool] default=None) – overwrite the path if it is an existing cache directory

  • ra_name ([str] default=ra) – column name of right ascension (in degrees)

  • dec_name ([str] default=dec) – column name of declination (in degrees)

  • weight_name ([str] default=None) – column name of weight

  • redshift_name ([str] default=None) – column name of redshift

  • degrees ([bool] default=True) – Whether the input coordinates are in degrees or radian.

  • patch_file ([str] default=None) – path to ASCII file that lists patch centers (one per line) as pair of R.A./Dec. in radian, separated by a single space or tab

  • patch_name ([str] default=None) – column name of patch index (starting from 0)

  • patch_num ([int] default=None) – number of spatial patches to create using knn on coordinates of randoms

  • probe_size ([int] default=-1) – The approximate number of objects to sample from the input file when generating patch centers.

  • max_workers ([int] default=None) – configure a custom maximum number of parallel workers to use

  • verbose ([str] default=info) – lowest log level emitted by yet_another_wizz

  • data (TableHandle (INPUT))

  • rand (TableHandle (INPUT))

  • patch_source (YawCacheHandle (INPUT))

  • output (YawCacheHandle (OUTPUT))

algo_parameters: set[str] = {'dec_name', 'degrees', 'max_workers', 'overwrite', 'patch_file', 'patch_name', 'patch_num', 'path', 'probe_size', 'ra_name', 'redshift_name', 'weight_name'}

Lists the names of all algorithm-specific parameters that were added when subclassing.

create(data, rand=None, patch_source=None, **kwargs)

Create the new cache directory and split the input data into spatial patches.

Parameters:
  • data (DataFrame) – The data set to split into patches and cache.

  • rand (DataFrame, optional) – The randoms to split into patches and cache, positions used to automatically generate patch centers if provided and stage is configured with patch_num. For interactive mode RAIL, set to the string “none” if not desired.

  • patch_source (YawCache, optional) – An existing cache instance that provides the patch centers. Use to ensure consistent patch centers when running cross-correlations. Takes precedence over the any configuration parameters. For interactive mode RAIL, set to the string “none” if not desired.

Returns:

A handle for the newly created cache directory.

Return type:

YawCacheHandle

entrypoint_function: str | None = 'create'
inputs = [('data', <class 'rail.core.data.TableHandle'>), ('rand', <class 'rail.core.data.TableHandle'>), ('patch_source', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]
interactive_function: str | None = 'yaw_cache_create'
name = 'YawCacheCreate'
outputs = [('output', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.YawCrossCorrelate

Bases: YawRailStage

Wrapper stage for yaw.crosscorrelate to compute the angular cross- correlation amplitude between the reference and the unknown sample.

Generally used for the reference sample to compute an estimate for its galaxy sample as a function of redshift. Data sets are provided as cache directories. The reference sample must have redshifts and at least one cache must have randoms attached.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • rmin ([float] (required)) – Single or sequence of lower scale limits in given ‘unit’.

  • rmax ([float] (required)) – Single or sequence of upper scale limits in given ‘unit’.

  • unit ([str] default=kpc) – The unit of the lower and upper scale limits.

  • rweight ([float] default=None) – Power-law exponent used to weight pairs by their separation.

  • resolution ([int] default=None) – Number of radial logarithmic bin used to approximate the weighting by separation.

  • zmin ([float] default=None) – Lowest redshift bin edge to generate (alternatively use ‘edges’).

  • zmax ([float] default=None) – Highest redshift bin edge to generate (alternatively use ‘edges’).

  • num_bins ([int] default=30) – Number of redshift bins to generate between ‘zmin’ and ‘zmax’.

  • method ([str] default=linear) – Method used to compute the spacing of bin edges.

  • edges ([float] default=None) – Use these custom bin edges instead of generating them.

  • closed ([str] default=right) – String indicating the side of the bin intervals that are closed.

  • max_workers ([int] default=None) – configure a custom maximum number of parallel workers to use

  • verbose ([str] default=info) – lowest log level emitted by yet_another_wizz

  • reference (YawCacheHandle (INPUT))

  • unknown (YawCacheHandle (INPUT))

  • output (YawCorrFuncHandle (OUTPUT))

algo_parameters: set[str] = {'closed', 'edges', 'max_workers', 'method', 'num_bins', 'resolution', 'rmax', 'rmin', 'rweight', 'unit', 'zmax', 'zmin'}

Lists the names of all algorithm-specific parameters that were added when subclassing.

correlate(reference, unknown, **kwargs)

Measure the angular cross-correlation amplitude in bins of redshift.

Parameters:
  • reference (YawCache) – Cache for the reference data, must have redshifts. If no randoms are attached, the unknown data cache must provide them.

  • unknown (YawCache) – Cache for the unknown data. If no randoms are attached, the reference data cache must provide them.

Returns:

A handle for the yaw.CorrFunc instance that holds the pair counts.

Return type:

YawCorrFuncHandle

entrypoint_function: str | None = 'correlate'
inputs = [('reference', <class 'rail.yaw_rail.handles.YawCacheHandle'>), ('unknown', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]
interactive_function: str | None = 'yaw_cross_correlate'
name = 'YawCrossCorrelate'
outputs = [('output', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.stages.YawSummarize

Bases: YawRailStage

A summarizer that computes a clustering redshift estimate from the measured correlation amplitudes.

Evaluates the cross-correlation pair counts with the provided estimator. Additionally corrects for galaxy sample bias if autocorrelation measurements are provided as stage inputs.

Note: This summarizer does not produce a PDF, but a ratio of correlation functions, which may result in negative values. Further modelling of the output is required.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • verbose ([str] default=info) – lowest log level emitted by yet_another_wizz

  • cross_corr (YawCorrFuncHandle (INPUT))

  • auto_corr_ref (YawCorrFuncHandle (INPUT))

  • auto_corr_unk (YawCorrFuncHandle (INPUT))

  • output (ModelHandle (OUTPUT))

algo_parameters: set[str] = {}

Lists the names of all algorithm-specific parameters that were added when subclassing.

entrypoint_function: str | None = 'summarize'
inputs = [('cross_corr', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>), ('auto_corr_ref', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>), ('auto_corr_unk', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>)]
interactive_function: str | None = 'yaw_summarize'
name = 'YawSummarize'
outputs = [('output', <class 'rail.core.data.ModelHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
summarize(cross_corr, auto_corr_ref=None, auto_corr_unk=None, **kwargs)

Compute a clustring redshift estimate and convert it to a PDF.

Parameters:
  • cross_corr (CorrFunc) – Pair counts from the cross-correlation measurement, basis for the clustering redshift estimate.

  • auto_corr_ref (CorrFunc, optional) – Pair counts from the reference sample autocorrelation measurement, used to correct for the reference sample galaxy bias.

  • auto_corr_unk (CorrFunc, optional) – Pair counts from the unknown sample autocorrelation measurement, used to correct for the reference sample galaxy bias. Typically only availble when using simulated data sets. For interactive mode RAIL, set to the string “none” if not desired.

Returns:

The clustering redshift estimate, spatial (jackknife) samples thereof, and its covariance matrix.

Return type:

YawRedshiftDataHandle