rail.estimation.classifier module

Abstract base classes defining classifiers.

class rail.estimation.classifier.CatClassifier(args, comm=None)[source]

Bases: RailStage

The base class for assigning classes to catalogue-like table.

Classifier uses a generic “model”, the details of which depends on the sub-class.

CatClassifier take as “input” a catalogue-like table, assign each object into a tomographic bin, and provide as “output” a tabular data which can be appended to the catalogue.

classify(input_data)[source]

The main run method for the classifier, should be implemented in the specific subclass.

This will attach the input_data to this CatClassifier (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).

Finally, this will return a TableHandle providing access to that output data.

Parameters:: input_data (dict) – A dictionary of all input data
Returns:: output – Class assignment for each galaxy.
Return type:: dict

config_options = {'chunk_size': 10000, 'hdf5_groupname': <class 'str'>, 'output_mode': <ceci.config.StageParameter object>}

inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]

name = 'CatClassifier'

open_model(**kwargs)[source]

Load the model and/or attach it to this Classifier

Parameters:: model (object, str or ModelHandle) – Either an object with a trained model, a path pointing to a file that can be read to obtain the trained model, or a ModelHandle providing access to the trained model.
Returns:: self.model – The object encapsulating the trained model.
Return type:: object

outputs = [('output', <class 'rail.core.data.TableHandle'>)]

class rail.estimation.classifier.PZClassifier(args, comm=None)[source]

Bases: RailStage

The base class for assigning classes (tomographic bins) to per-galaxy PZ

estimates.

PZClassifier takes as “input” a qp.Ensemble with per-galaxy PDFs, and provides as “output” tabular data which can be appended to the catalogue.

Configuration Parameters: output_mode [str]: What to do with the outputs (default=default) chunk_size [int]: (default=10000)

classify(input_data)[source]

The main run method for the classifier, should be implemented in the specific subclass.

This will attach the input_data to this PZClassifier (for introspection and provenance tracking).

Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.

The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).

The run() method relies on the _process_chunk() method, which should be implemented by subclasses to perform the actual classification on each chunk of data. The results from each chunk are then combined in the _finalize_run() method. (Alternatively, override run() in a subclass to perform the classification without parallelization.)

Finally, this will return a TableHandle providing access to that output data.

Parameters:: input_data (qp.Ensemble) – Per-galaxy p(z), and any ancilary data associated with it
Returns:: output – Class assignment for each galaxy, typically in the form of a dictionary with IDs and class labels.
Return type:: dict

config_options = {'chunk_size': 10000, 'output_mode': <ceci.config.StageParameter object>}

inputs = [('input', <class 'rail.core.data.QPHandle'>)]

name = 'PZClassifier'

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]

run()[source]

Processes the input data in chunks and performs classification.

This method iterates over chunks of the input data, calling the _process_chunk method for each chunk to perform the actual classification.

The _process_chunk method should be implemented by subclasses to define the specific classification logic.