Pipeline Mode Notebooks

RAIL comes with several notebooks that demonstrate how to use it to analyze data in a number of different ways.

Here we describe the various notebooks and suggest other ways in which you might study the data.

Starting out, overview notebooks

We recommend starting with the Goldenspike notebook, which demonstrates a relatively simple end-to-end analysis. This analysis starts off by making a model that can be used to generate synthetic catalogs of photometric data. It then uses that model to create sets of synthetic data to train and test per-object redshift estimators, i.e., estimators that compute p(z). From there it trains and tests a few estimators using some common algorithms. It then evaluates the performance of those estimators. Finally, it shows a few methods that converts p(z) for a set of objects to an ensemble distribution n(z).

The estimation notebook focuses more on the estimation parts of the analysis, and demonstrates a few additional estimation algorithms.

The evaluation of the estimator performance is described in more depth in its own notebook.

Finally, we have collected demonstrations of useful utilites to explore which packages and algorithms are available in the current RAIL installation.

Deeper dives into synthetic data creation

The notebooks in the creation directory demonstrate how how to generate synthetic photometric data, and also how to “degrade” the synthetic data by applying various effects to the data.

These notebooks demonstrate utilities that can be used to prepare data for analysis, e.g., by converting fluxes to magnitudes and applying dereddening and by converting fluxes to hyperbolic magnitudes

Examples of using specific estimators

The notebooks in this directory demonstrate specific p(z) estimators in more detail. For example, the CMNN, GPz and NZDIR algorithms.

These two notebooks demonstrate self-organizing map (SOM) based algorithms that estimate the ensemble n(z) distribution: the first works with the SOM directly, while the second clusters the SOM cells to reduce statistical fluctuations.

Finally, the test_sampled_summarizers notebook demonstrates converting collections of per-object p(z) estimates to ensemble n(z) estimates.

Deeper explanations of rail concepts

This notebook demonstrates how to convert a notebook into a ceci analysis pipeline.

Additionally, Iterate_Tabular_Data notebook demonstrates the mechanisms we use to iterate over tabular data, which is needed to avoid reading entire object catalogs into memory.