rail.estimation.algos.cc_yaw module

This file implements all stages required to wrap yet_another_wizz in RAIL. These are:

  • YawCacheCreate: Preprocessing input data and arranging them in spatial patches for efficient acces.

  • YawAutoCorrelate: Computing the autocorrelation by running the pair couting in spatial patches. Used for galaxy bias mitigation.

  • YawCrossCorrelate: Computing the cross-correlation by running the pair couting in spatial patches. Represents a biased redshift estimte.

  • YawSummarize: Transforming the correlation functin pair counts to a redshift estimate (not a PDF!).

class rail.estimation.algos.cc_yaw.YawAutoCorrelate

Bases: YawRailStage

Wrapper stage for yaw.autocorrelate to compute a sample’s angular autocorrelation amplitude.

Generally used for the reference sample to compute an estimate for its galaxy sample as a function of redshift. Data is provided as a single cache directory that must have redshifts and randoms with redshift attached.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • rmin ([float] (required)) – Single or sequence of lower scale limits in given ‘unit’.

  • rmax ([float] (required)) – Single or sequence of upper scale limits in given ‘unit’.

  • unit ([str] default=kpc) – The unit of the lower and upper scale limits.

  • rweight ([float] default=None) – Power-law exponent used to weight pairs by their separation.

  • resolution ([int] default=None) – Number of radial logarithmic bin used to approximate the weighting by separation.

  • zmin ([float] default=None) – Lowest redshift bin edge to generate (alternatively use ‘edges’).

  • zmax ([float] default=None) – Highest redshift bin edge to generate (alternatively use ‘edges’).

  • num_bins ([int] default=30) – Number of redshift bins to generate between ‘zmin’ and ‘zmax’.

  • method ([str] default=linear) – Method used to compute the spacing of bin edges.

  • edges ([float] default=None) – Use these custom bin edges instead of generating them.

  • closed ([str] default=right) – String indicating the side of the bin intervals that are closed.

  • max_workers ([int] default=None) – configure a custom maximum number of parallel workers to use

  • verbose ([str] default=info) – lowest log level emitted by yet_another_wizz

  • sample (YawCacheHandle (INPUT))

  • output (YawCorrFuncHandle (OUTPUT))

algo_parameters: set[str] = {'closed', 'edges', 'max_workers', 'method', 'num_bins', 'resolution', 'rmax', 'rmin', 'rweight', 'unit', 'zmax', 'zmin'}

Lists the names of all algorithm-specific parameters that were added when subclassing.

correlate(sample, **kwargs)

Measure the angular autocorrelation amplitude in bins of redshift.

Parameters:

sample (YawCache) – Input cache which must have randoms attached and redshifts for both data set and randoms.

Returns:

A handle for the yaw.CorrFunc instance that holds the pair counts.

Return type:

YawCorrFuncHandle

entrypoint_function: str | None = 'correlate'
inputs = [('sample', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]
interactive_function: str | None = 'yaw_auto_correlate'
name = 'YawAutoCorrelate'
outputs = [('output', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

class rail.estimation.algos.cc_yaw.YawCacheCreate

Bases: YawRailStage

Create a new cache directory to hold a data set and optionally its matching random catalog.

Both input data sets are split into consistent spatial patches that are required by yet_another_wizz for correlation function covariance estimates. Each patch is stored separately for efficient access.

The cache can be constructed from input files or tabular data in memory. Column names for sky coordinates are required, redshifts and per-object weights are optional. One out of three patch create methods must be specified:

  1. Splitting the data into predefined patches (from ASCII file or an existing cache instance, linked as optional stage input).

  2. Splitting the data based on a column with patch indices.

  3. Generating approximately equal size patches using k-means clustering of objects positions (preferably randoms if provided).

Note: The cache directory must be deleted manually when it is no longer needed. (The reference sample cache may be reused when operating on tomographic bins.)

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • path ([str] (required)) – path to cache directory, must not exist

  • overwrite ([bool] default=None) – overwrite the path if it is an existing cache directory

  • ra_name ([str] default=ra) – column name of right ascension (in degrees)

  • dec_name ([str] default=dec) – column name of declination (in degrees)

  • weight_name ([str] default=None) – column name of weight

  • redshift_name ([str] default=None) – column name of redshift

  • degrees ([bool] default=True) – Whether the input coordinates are in degrees or radian.

  • patch_file ([str] default=None) – path to ASCII file that lists patch centers (one per line) as pair of R.A./Dec. in radian, separated by a single space or tab

  • patch_name ([str] default=None) – column name of patch index (starting from 0)

  • patch_num ([int] default=None) – number of spatial patches to create using knn on coordinates of randoms

  • probe_size ([int] default=-1) – The approximate number of objects to sample from the input file when generating patch centers.

  • max_workers ([int] default=None) – configure a custom maximum number of parallel workers to use

  • verbose ([str] default=info) – lowest log level emitted by yet_another_wizz

  • data (TableHandle (INPUT))

  • rand (TableHandle (INPUT))

  • patch_source (YawCacheHandle (INPUT))

  • output (YawCacheHandle (OUTPUT))

algo_parameters: set[str] = {'dec_name', 'degrees', 'max_workers', 'overwrite', 'patch_file', 'patch_name', 'patch_num', 'path', 'probe_size', 'ra_name', 'redshift_name', 'weight_name'}

Lists the names of all algorithm-specific parameters that were added when subclassing.

create(data, rand=None, patch_source=None, **kwargs)

Create the new cache directory and split the input data into spatial patches.

Parameters:
  • data (DataFrame) – The data set to split into patches and cache.

  • rand (DataFrame, optional) – The randoms to split into patches and cache, positions used to automatically generate patch centers if provided and stage is configured with patch_num. For interactive mode RAIL, set to the string “none” if not desired.

  • patch_source (YawCache, optional) – An existing cache instance that provides the patch centers. Use to ensure consistent patch centers when running cross-correlations. Takes precedence over the any configuration parameters. For interactive mode RAIL, set to the string “none” if not desired.

Returns:

A handle for the newly created cache directory.

Return type:

YawCacheHandle

entrypoint_function: str | None = 'create'
inputs = [('data', <class 'rail.core.data.TableHandle'>), ('rand', <class 'rail.core.data.TableHandle'>), ('patch_source', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]
interactive_function: str | None = 'yaw_cache_create'
name = 'YawCacheCreate'
outputs = [('output', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

class rail.estimation.algos.cc_yaw.YawCrossCorrelate

Bases: YawRailStage

Wrapper stage for yaw.crosscorrelate to compute the angular cross- correlation amplitude between the reference and the unknown sample.

Generally used for the reference sample to compute an estimate for its galaxy sample as a function of redshift. Data sets are provided as cache directories. The reference sample must have redshifts and at least one cache must have randoms attached.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • rmin ([float] (required)) – Single or sequence of lower scale limits in given ‘unit’.

  • rmax ([float] (required)) – Single or sequence of upper scale limits in given ‘unit’.

  • unit ([str] default=kpc) – The unit of the lower and upper scale limits.

  • rweight ([float] default=None) – Power-law exponent used to weight pairs by their separation.

  • resolution ([int] default=None) – Number of radial logarithmic bin used to approximate the weighting by separation.

  • zmin ([float] default=None) – Lowest redshift bin edge to generate (alternatively use ‘edges’).

  • zmax ([float] default=None) – Highest redshift bin edge to generate (alternatively use ‘edges’).

  • num_bins ([int] default=30) – Number of redshift bins to generate between ‘zmin’ and ‘zmax’.

  • method ([str] default=linear) – Method used to compute the spacing of bin edges.

  • edges ([float] default=None) – Use these custom bin edges instead of generating them.

  • closed ([str] default=right) – String indicating the side of the bin intervals that are closed.

  • max_workers ([int] default=None) – configure a custom maximum number of parallel workers to use

  • verbose ([str] default=info) – lowest log level emitted by yet_another_wizz

  • reference (YawCacheHandle (INPUT))

  • unknown (YawCacheHandle (INPUT))

  • output (YawCorrFuncHandle (OUTPUT))

algo_parameters: set[str] = {'closed', 'edges', 'max_workers', 'method', 'num_bins', 'resolution', 'rmax', 'rmin', 'rweight', 'unit', 'zmax', 'zmin'}

Lists the names of all algorithm-specific parameters that were added when subclassing.

correlate(reference, unknown, **kwargs)

Measure the angular cross-correlation amplitude in bins of redshift.

Parameters:
  • reference (YawCache) – Cache for the reference data, must have redshifts. If no randoms are attached, the unknown data cache must provide them.

  • unknown (YawCache) – Cache for the unknown data. If no randoms are attached, the reference data cache must provide them.

Returns:

A handle for the yaw.CorrFunc instance that holds the pair counts.

Return type:

YawCorrFuncHandle

entrypoint_function: str | None = 'correlate'
inputs = [('reference', <class 'rail.yaw_rail.handles.YawCacheHandle'>), ('unknown', <class 'rail.yaw_rail.handles.YawCacheHandle'>)]
interactive_function: str | None = 'yaw_cross_correlate'
name = 'YawCrossCorrelate'
outputs = [('output', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

class rail.estimation.algos.cc_yaw.YawSummarize

Bases: YawRailStage

A summarizer that computes a clustering redshift estimate from the measured correlation amplitudes.

Evaluates the cross-correlation pair counts with the provided estimator. Additionally corrects for galaxy sample bias if autocorrelation measurements are provided as stage inputs.

Note: This summarizer does not produce a PDF, but a ratio of correlation functions, which may result in negative values. Further modelling of the output is required.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • verbose ([str] default=info) – lowest log level emitted by yet_another_wizz

  • cross_corr (YawCorrFuncHandle (INPUT))

  • auto_corr_ref (YawCorrFuncHandle (INPUT))

  • auto_corr_unk (YawCorrFuncHandle (INPUT))

  • output (ModelHandle (OUTPUT))

algo_parameters: set[str] = {}

Lists the names of all algorithm-specific parameters that were added when subclassing.

entrypoint_function: str | None = 'summarize'
inputs = [('cross_corr', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>), ('auto_corr_ref', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>), ('auto_corr_unk', <class 'rail.yaw_rail.handles.YawCorrFuncHandle'>)]
interactive_function: str | None = 'yaw_summarize'
name = 'YawSummarize'
outputs = [('output', <class 'rail.core.data.ModelHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

summarize(cross_corr, auto_corr_ref=None, auto_corr_unk=None, **kwargs)

Compute a clustring redshift estimate and convert it to a PDF.

Parameters:
  • cross_corr (CorrFunc) – Pair counts from the cross-correlation measurement, basis for the clustering redshift estimate.

  • auto_corr_ref (CorrFunc, optional) – Pair counts from the reference sample autocorrelation measurement, used to correct for the reference sample galaxy bias.

  • auto_corr_unk (CorrFunc, optional) – Pair counts from the unknown sample autocorrelation measurement, used to correct for the reference sample galaxy bias. Typically only availble when using simulated data sets. For interactive mode RAIL, set to the string “none” if not desired.

Returns:

The clustering redshift estimate, spatial (jackknife) samples thereof, and its covariance matrix.

Return type:

YawRedshiftDataHandle

rail.estimation.algos.cc_yaw.create_yaw_cache_alias(suffix)

Create an alias mapping for all YawCacheCreate stage in- and outputs.

Useful when creating a new stage with make_stage, e.g. by setting aliases=create_yaw_cache_alias(“suffix”).

Parameters:
  • name (str) – The suffix to append to the in- and output tags, e.g. “data_suffix”.

  • suffix (str)

Returns:

Mapping from original to aliased in- and output tags.

Return type:

dict