rail.estimation.algos.random_forest module

An example classifier that uses catalogue information to classify objects into tomoragphic bins using random forest. This is the base method in TXPipe, adapted from TXpipe/binning/random_forest.py Note: extra dependence on sklearn and input training file.

class rail.estimation.algos.random_forest.RandomForestClassifier

Bases: CatClassifier

Classifier that assigns tomographic bins based on random forest method

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • chunk_size ([int] default=10000) – Number of objects per chunk for parallel processing or to evalute per loop in single node processing

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • id_name ([str] default=) – Column name for the object ID in the input data, if empty the row index is used as the ID.

  • class_bands ([list] default=['r', 'i', 'z']) – Which bands to use for classification

  • band_map ([dict] default={'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst'}) – column names for the the bands

  • model (ModelHandle (INPUT))

  • input (TableHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = 'classify'
interactive_function: str | None = 'random_forest_classifier'
name = 'RandomForestClassifier'
open_model(**kwargs)

Load the mode and/or attach it to this Stage

Parameters:
  • tag – Input tag associated to the model

  • **kwargs – Should include ‘model’, see notes

Notes

The keyword arguement ‘model’ should be either

  1. an object with a trained model,

  2. a path pointing to a file that can be read to obtain the trained model,

  3. or a ModelHandle providing access to the trained model.

Returns:

The object encapsulating the trained model.

Return type:

Any

outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

Apply the classifier to the measured magnitudes

class rail.estimation.algos.random_forest.RandomForestInformer

Bases: CatInformer

Train the random forest classifier

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • hdf5_groupname ([str] default=photometry) – name of hdf5 group for data, if None, then set to ‘’

  • class_bands ([list] default=['r', 'i', 'z']) – Which bands to use for classification

  • band_map ([dict] default={'r': 'mag_r_lsst', 'i': 'mag_i_lsst', 'z': 'mag_z_lsst'}) – column names for the the bands

  • redshift_col ([str] default=sz) – Redshift column names

  • bin_edges ([list] default=[0, 0.5, 1.0]) – Binning for training data

  • seed ([int] (required)) – random seed

  • no_assign ([int] default=-99) – Value for no assignment flag

  • input (TableHandle (INPUT))

  • model (ModelHandle (OUTPUT))

entrypoint_function: str | None = 'inform'
interactive_function: str | None = 'random_forest_informer'
name = 'RandomForestInformer'
outputs = [('model', <class 'rail.core.data.ModelHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

class rail.estimation.algos.random_forest.randomForestmodel

Bases: object

Temporary class to store the trained model.

__init__(skl_classifier, features)