Interactive Usage

This page details the usage of RAIL in interactive mode. RAIL can be run interactively, such as in a Jupyter notebook. This interactive mode provides a user-friendly front-end to RAIL functionality. It is the recommended mode of usage for:

exploring the functionality, stages, and objects of RAIL
developing new workflows and pipelines
operating on smaller data sets
integrating other code bases

To see examples of how to use RAIL in interactive mode, visit the Interactive Mode Notebooks.

To see the interactive mode API reference, visit the Interactive API.

To learn how to run RAIL in pipeline mode, visit Pipeline Usage.

Note

When using the interactive module in certain code editors, tab completion may suggest that functions and submodules are present which cannot actually be used.

If you encounter a function which you would like to run, but find yourself unable, use the information in the docstring to identify which RAIL package defines the related RailStage, and install it.

Finding your way around RAIL

The below table shows the large-scale structure of the RAIL package.

These hierarchical namespaces used to organize objects and algorithms. A namespace package’s modules and content belong to the same section of functionality, and serve the same purpose. For example, the estimation.algos namespace contains modules of different estimation algorithms.

Namespace	Description
`creation`	create and degrade synthetic photometric data
`creation.engines`	algorithms to generate synthetic photometric data
`creation.degraders`	algorithms to apply degradations to synthetic photometric data
`estimation`	derive redshift information from photometric data
`estimation.algos`	algorithms to estimate per-galaxy photo-z PDFs
`evaluation`	evaluate photo-z estimator performance
`evaluation.metrics`	metrics for evaluation of redshift estimation
`utils`	utility functions, e.g. catalog, testing
`tools`	utility stages, e.g. photometry, tables
`pipelines`	create ‘mini runner’ pipelines
`pipelines.degradation`	pre-defined degradation pipelines
`pipelines.estimation`	pre-defined estimation pipelines
`cli`	utility functions for the command line

The interactive module mirrors the same import structure as Pipeline Mode RAIL, but with ‘interactive’ inserted into the path. Thus a RailStage typically imported from rail.estimation.algos.random_gauss will have an interactive function defined in rail.interactive.estimation.algos.random_gauss().

Cookbook

This cookbook contains tutorials for a variety of interactive mode RAIL use cases.

Running a RAIL algorithm in interactive mode

To run a RAIL stage in interactive mode, you run the corresponding interactive function. Here’s an example of running the inform stage of the K-Nearest Neighbours estimation algorithm. First we use find_rail_file and tables_io to read in the input data file we need:

>>> import rail.interactive as ri #import all rail interactive functions
>>> import tables_io #for reading and writing data
>>> from rail.utils.path_utils import find_rail_file #for getting our training data
>>> training_data_file = find_rail_file("examples_data/testdata/test_dc2_training_9816.hdf5")
>>> training_data = tables_io.read(calibration_data_file) #read in training data file

Then we can just run the k_near_neigh_informer function, giving it the required training_data argument (we’ve not included the output from the actual function running since it’s a bit long). We’ll take a look at the output it returns:

>>> knn_model = ri.estimation.algos.k_nearneigh.k_near_neigh_informer(training_data=training_data)
>>> print(knn_model) #output is returned as a dictionary
{'model': {'kdtree': <sklearn.neighbors._kd_tree.KDTree object at 0x5d6fbc75d130>,
    'bestsig': np.float64(0.023333333333333334), 'nneigh': 7,
    'truezs': array([0.02043499, 0.01936132, 0.03672067, ..., 2.97927326, 2.98694714,
    2.97646626], shape=(10225,)), 'only_colors': False}}

In general, these functions return outputs as a dictionary, with the relevant data tables or objects as values in the dictionary.

Tip

If you’d like a more detailed example of using RAIL interactive functions, take a look at the introduction to RAIL interactive notebook

Running RAIL with different column names

RAIL stages typically expect a certain default set of column names for photometry and error columns. These are just supplied by default arguments to some common configuration parameters, and can be easily overridden by providing your own arguments to those parameters to tell the code to expect the columns that your data has. Depending on the stage, you may only need some of these values, but often supplying all of these values to every stage will not provide issues, as additional unnecessary arguments are just ignored.

For example, if you are using a data table that is a pandas DataFrame with the following column names: ["u", "g", "r", "i", "z", "y", "J", "H"], then you need to provide the following parameters as shown in the example below.

hdf5_groupname
bands
err_bands
mag_limits
ref_band

First we set up the bands, err_bands, and mag_limits parameters, which all rely on each other:

>>> bands = ["u", "g", "r", "i", "z", "y", "J", "H"] #the column names of our photometry bands
>>> errbands = []
>>> maglims = {}
>>> limvals = [27.8, 29.0, 29.1, 28.6, 28.0, 27.0, 26.4, 26.4] #magnitude limits for the bands
>>> for band, limval in zip(bands, limvals): #populate the empty errbands and maglims
>>>     errbands.append(f"{band}_err")
>>>     maglims[band] = limval
>>> print(bands)
>>> print(errbands)
>>> print(maglims)
['u', 'g', 'r', 'i', 'z', 'y', 'J', 'H']
['u_err', 'g_err', 'r_err', 'i_err', 'z_err', 'y_err', 'J_err', 'H_err']
{'u': 27.8, 'g': 29.0, 'r': 29.1, 'i': 28.6, 'z': 28.0, 'y': 27.0, 'J': 26.4, 'H': 26.4}

Now that we have these set up, we can put them into a dictionary with the other two parameters we need. hdf5_groupname tells the code what key it needs to access the data table from a dictionary, or if there is no key needed, just give it an empty string. The ref_band parameter identifies one of the photometry bands to use as a reference, so you just need to provide it with the column name of that band:

>>> columns_dict = dict(hdf5_groupname="", bands=bands, err_bands=errbands,
>>>                     mag_limits=maglims, ref_band="i")
{'hdf5_groupname': '', 'bands': ['u', 'g', 'r', 'i', 'z', 'y', 'J', 'H'],
    'err_bands': ['u_err', 'g_err', 'r_err', 'i_err', 'z_err', 'y_err', 'J_err', 'H_err'],
    'mag_limits': {'u': 27.8, 'g': 29.0, 'r': 29.1, 'i': 28.6, 'z': 28.0, 'y': 27.0, 'J': 26.4,
    'H': 26.4}, 'ref_band': 'i'}

Now we can pass this dictionary into any of the RAIL stages, along with any of the required parameters. Here we’re assuming that we have a data table with the column names in columns_dict called training_data:

>>> ri.estimation.algos.k_nearneigh.k_near_neigh_informer(training_data=training_data, **columns_dict)

The function should run as normal, and return to you any output with the same column names as what you input.

Tip

Take a look at the Running with different data notebook for a bit more detailed example of this.

Renaming columns

RAIL also has a utility for renaming the columns in your table, which you could use to make your column names match the expected versions, for example:

>>> rename_dict = {band: f"mag_{band}_lsst" for band in bands}
>>> print(rename_dict)
{'u': 'mag_u_lsst',
 'g': 'mag_g_lsst',
 'r': 'mag_r_lsst',
 'i': 'mag_i_lsst',
 'z': 'mag_z_lsst',
 'y': 'mag_y_lsst',
 'J': 'mag_J_lsst',
 'H': 'mag_H_lsst'}

You can then use this dictionary to rename the columns in your pandas DataFrame:

>>> train_data_pq = ri.tools.table_tools.column_mapper(data=train_data["output"], columns=rename_dict)
>>> print(train_data_pq["output"].columns)
Index(['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst',
       'mag_y_lsst', 'mag_J_lsst', 'mag_H_lsst'],
      dtype='object')

Tip

Check out the introduction to RAIL interactive notebook to see this RAIL utility being used in an example RAIL workflow.

Converting tables to other formats

RAIL has a stage that converts tabular RAIL data to different formats. This will not work on data in Ensemble format, or on models. The formats are those supported by tables_io, and you can see the supported tabular formats table for a list of the options. Just provide the “Tabular format name” to the function as shown here:

>>> train_data = ri.tools.table_tools.table_converter(data=train_data_pq["output"],
>>>              output_format="numpyDict")
>>> type(train_data["output"])
collections.OrderedDict

In this case, “numpyDict” refers to an OrderedDict of numpy arrays, and we can see here that the data has in fact been converted to that format.

Tip

Check out the Goldenspike notebook to see this RAIL utility used in an example workflow.

Saving the outputs of stages to file

Inform stages typically output ‘models’, which can be different kinds of objects depending on the algorithm. If you want to save these to a file to speed up later workflows, we recommend pickling them. For example, assuming that you’ve just run an inform stage and gotten knn_inform as the returned output dictionary:

>>> import pickle
>>> with open("./knn_model.pkl", "wb") as fout:
>>>    pickle.dump(obj=knn_inform["model"], file=fout, protocol=pickle.HIGHEST_PROTOCOL)

You can then provide the name of the file to the interactive functions.

Tip

Take a look at the Using Photometry to Estimate Photmetric Redshifts notebook to see more details and how saving a model file can be useful in a RAIL workflow.

Most estimate stages output qp Ensembles, which have their own write and read methods. The following will write an Ensemble to an HDF5 file, assuming that you have just run an estimate stage and gotten estimate_output as the return dictionary:

>>> import qp
>>> estimate_output["output"].write_to("output_ensemble.hdf5")

To read the Ensemble back in, you can also use qp:

>>> import qp
>>> estimate_ens = qp.read("output_ensemble.hdf5")
>>> estimate_ens
Ensemble(the_class=hist,shape=(2, 50))

If you’d like to convert the above Ensemble to a more typical array, you can do the following to get an array of PDF values at a set of redshift grid points:

>>> import numpy as np
>>> zgrid = np.linspace(0,3,200)
>>> output_photoz_pdf_values = estimate_output["output"].pdf(zgrid)
>>> print(output_photoz_pdf_values.shape)
(2, 200)

You can now turn the array into a table or save it to a file directly using your preferred method.

Tip

Take a look at the Using Photometry to Estimate Photmetric Redshifts notebook for an introduction to Ensembles in RAIL and to see them being saved and converted in a RAIL workflow.

For more details about how qp stores Ensembles in files, take a look at the notebook on Exploring the structure of an Ensemble file

Note

If you have any examples that you think should be added to this list, you can open an issue in RAIL, or if you are a part of the LSST Slack, send a message in the #desc-pz-rail Slack channel!