rail.estimation.classifier module
Abstract base classes defining classifiers.
- class rail.estimation.classifier.CatClassifier(args, comm=None)[source]
Bases:
RailStage
The base class for assigning classes to catalogue-like table.
Classifier uses a generic “model”, the details of which depends on the sub-class.
CatClassifier take as “input” a catalogue-like table, assign each object into a tomographic bin, and provide as “output” a tabular data which can be appended to the catalogue.
- classify(input_data)[source]
The main run method for the classifier, should be implemented in the specific subclass.
This will attach the input_data to this CatClassifier (for introspection and provenance tracking).
Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.
The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).
Finally, this will return a TableHandle providing access to that output data.
- Parameters:
input_data (dict) – A dictionary of all input data
- Returns:
output – Class assignment for each galaxy.
- Return type:
dict
- config_options = {'chunk_size': 10000, 'hdf5_groupname': <class 'str'>, 'output_mode': <ceci.config.StageParameter object>}
- inputs = [('model', <class 'rail.core.data.ModelHandle'>), ('input', <class 'rail.core.data.TableHandle'>)]
- name = 'CatClassifier'
- open_model(**kwargs)[source]
Load the model and/or attach it to this Classifier
- Parameters:
model (object, str or ModelHandle) – Either an object with a trained model, a path pointing to a file that can be read to obtain the trained model, or a ModelHandle providing access to the trained model.
- Returns:
self.model – The object encapsulating the trained model.
- Return type:
object
- outputs = [('output', <class 'rail.core.data.TableHandle'>)]
- class rail.estimation.classifier.PZClassifier(args, comm=None)[source]
Bases:
RailStage
- The base class for assigning classes (tomographic bins) to per-galaxy PZ
estimates.
PZClassifier takes as “input” a qp.Ensemble with per-galaxy PDFs, and provides as “output” tabular data which can be appended to the catalogue.
Configuration Parameters: output_mode [str]: What to do with the outputs (default=default) chunk_size [int]: (default=10000)
- classify(input_data)[source]
The main run method for the classifier, should be implemented in the specific subclass.
This will attach the input_data to this PZClassifier (for introspection and provenance tracking).
Then it will call the run() and finalize() methods, which need to be implemented by the sub-classes.
The run() method will need to register the data that it creates to this Classifier by using self.add_data(‘output’, output_data).
The run() method relies on the _process_chunk() method, which should be implemented by subclasses to perform the actual classification on each chunk of data. The results from each chunk are then combined in the _finalize_run() method. (Alternatively, override run() in a subclass to perform the classification without parallelization.)
Finally, this will return a TableHandle providing access to that output data.
- Parameters:
input_data (qp.Ensemble) – Per-galaxy p(z), and any ancilary data associated with it
- Returns:
output – Class assignment for each galaxy, typically in the form of a dictionary with IDs and class labels.
- Return type:
dict
- config_options = {'chunk_size': 10000, 'output_mode': <ceci.config.StageParameter object>}
- inputs = [('input', <class 'rail.core.data.QPHandle'>)]
- name = 'PZClassifier'
- outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
- run()[source]
Processes the input data in chunks and performs classification.
This method iterates over chunks of the input data, calling the _process_chunk method for each chunk to perform the actual classification.
The _process_chunk method should be implemented by subclasses to define the specific classification logic.