rail.estimation.classifier module

Abstract base classes defining classifiers.

class rail.estimation.classifier.CatClassifier

Bases: RailStage

The base class for assigning classes to catalogue-like table.

Classifier uses a generic “model”, the details of which depends on the sub-class.

CatClassifier take as “input” a catalogue-like table, assign each object into a tomographic bin, and provide as “output” a tabular data which can be appended to the catalogue.

__init__(args, **kwargs)

Initialize Classifier

Parameters:

args (Any)
kwargs (Any)

Return type:

None

classify(input_data, **kwargs)

The main run method for the classifier, should be implemented in the specific subclass.

This will attach the input_data to this CatClassifier (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).

Finally, this will return a TableHandle providing access to that output data.

Parameters:: input_data (TableLike) – A dictionary of all input data
Returns:: Class assignment for each galaxy.
Return type:: TableHandle

entrypoint_function: str | None = 'classify'

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]

name = 'CatClassifier'

outputs = [('output', <class 'rail.core.data.TableHandle'>)]

class rail.estimation.classifier.PZClassifier

Bases: RailStage

The base class for assigning classes (tomographic bins) to per-galaxy PZ estimates.

PZClassifier takes as “input” a qp.Ensemble with per-galaxy PDFs, and provides as “output” tabular data which can be appended to the catalogue.

Parameters:

output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.
chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing
input (QPHandle (INPUT))
output (Hdf5Handle (OUTPUT))

__init__(args, **kwargs)

Initialize the PZClassifier.

Parameters:

args (Any)
kwargs (Any)

Return type:

None

classify(input_data, **kwargs)

The main run method for the classifier, should be implemented in the specific subclass.

This will attach the input_data to this PZClassifier (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).

The run() method relies on the _process_chunk() method, which should be implemented by subclasses to perform the actual classification on each chunk of data. The results from each chunk are then combined in the _finalize_run() method. (Alternatively, override run() in a subclass to perform the classification without parallelization.)

Finally, this will return a TableHandle providing access to that output data.

Parameters:: input_data (qp.Ensemble) – Per-galaxy p(z), and any ancilary data associated with it
Returns:: Class assignment for each galaxy, typically in the form of a dictionary with IDs and class labels.
Return type:: TableHandle

entrypoint_function: str | None = 'classify'

inputs = [('input', <class 'rail.core.data.QPHandle'>)]

name = 'PZClassifier'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run()

Processes the input data in chunks and performs classification.

This method iterates over chunks of the input data, calling the _process_chunk method for each chunk to perform the actual classification.

The _process_chunk method should be implemented by subclasses to define the specific classification logic.

Return type:: None