rail.estimation.algos.somocluSOM module

class rail.estimation.algos.somocluSOM.Inform_somocluSOMSummarizer(args, comm=None)[source]

Bases: CatInformer

Summarizer that uses a SOM to construct a weighted sum of spec-z objects in the same SOM cell as each photometric galaxy in order to estimate the overall N(z). This is very related to the NZDir estimator, though that estimator actually reverses this process and looks for photometric neighbors around each spectroscopic galaxy, which can lead to problems if there are photometric galaxies with no nearby spec-z objects (NZDir is not aware that such objects exist and thus can hid biases).

We apply somoclu package (https://somoclu.readthedocs.io/) to train the SOM.

Part of the SOM estimator will be a check for cells which contain photometric objects but do not contain any corresponding training/spec-z objects, those unmatched objects will be flagged for possible removal from the input sample. The inform stage will simply construct a 2D grid SOM using somoclu from a large sample of input photometric data and save this as an output. This may be a computationally intensive stage, though it will hopefully be run once and used by the estimate/summarize stage many times without needing to be re-run.

We can make the SOM either with all colors, or one magnitude and N colors, or an arbitrary set of columns. The code includes a flag column_usage to set usage, If set to “colors” it will take the difference of each adjacen pair of columns in bands as the colors. If set to magandcolors it will use these colors plus one magnitude as specified by ref_band. If set to columns then it will take as inputs all of the columns specified by bands (they can be magnitudes, colors, or any other input specified by the user). NOTE: any custom bands parameters must have an accompanying nondetect_val dictionary that will replace nondetections with the nondetect_val values!

Returns:

  • model (pickle file) – pickle file containing the somoclu SOM object that

  • will be used by the estimation/summarization stage

config_options = {'bands': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'column_usage': <ceci.config.StageParameter object>, 'err_bands': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'gridtype': <ceci.config.StageParameter object>, 'hdf5_groupname': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'mag_limits': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'maptype': <ceci.config.StageParameter object>, 'n_columns': <ceci.config.StageParameter object>, 'n_rows': <ceci.config.StageParameter object>, 'nondetect_val': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'output_mode': <ceci.config.StageParameter object>, 'redshift_col': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'ref_band': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'save_train': True, 'seed': <ceci.config.StageParameter object>, 'som_learning_rate': <ceci.config.StageParameter object>, 'std_coeff': <ceci.config.StageParameter object>}
name = 'Inform_SOMoclu'
run()[source]

Build a SOM from photometric data NOT spectroscopic data!

rail.estimation.algos.somocluSOM.get_bmus(som, data=None, split=200)[source]

This function gets the “best matching unit (bmu)” of a given data on a pre-trained SOM. It works by multiprocessing chunks of the data. Input: som: a pre-trained Somoclu object; data: np.ndarray of the data vector. If None, then use the training data stored in the som object; split: an integer specifying the size of data chunks when calculating the distances between the codebook and data;

rail.estimation.algos.somocluSOM.plot_som(ax, som_map, grid_type='rectangular', colormap=<matplotlib.colors.ListedColormap object>, cbar_name=None, vmin=None, vmax=None)[source]

This function plots the pre-trained SOM. Input: ax: the axis to be plotted on. som_map: a 2-D array contains the value in a pre-trained SOM. The value can be the number of sources in each cell; or the mean feature in every cell. grid_type: string, either ‘rectangular’ or ‘hexagonal’. colormap: the colormap to show the values. default: cm.viridis. cbar_name: the label on the color bar.

class rail.estimation.algos.somocluSOM.somocluSOMSummarizer(args, comm=None)[source]

Bases: SZPZSummarizer

Quick implementation of a SOM-based summarizer. It will group a pre-trained SOM into hierarchical clusters and assign a galaxy sample into SOM cells and clusters. Then it constructs an N(z) estimation via a weighted sum of the empirical N(z) consisting of the normalized histogram of spec-z values contained in the same SOM cluster as each photometric galaxy. There are some general guidelines to choosing the geometry and number of total cells in the SOM. This paper: http://www.giscience2010.org/pdfs/paper_230.pdf recommends 5*sqrt(num rows * num data columns) as a rough guideline. Some authors state that a SOM with one dimension roughly twice as long as the other are better, while others find that square SOMs with equal X and Y dimensions are best, the user can set the dimensions using the n_columns and n_rows parameters. For more discussion on SOMs and photo-z calibration, see the KiDS paper on the topic: http://arxiv.org/abs/1909.09632 particularly the appendices. Note that several parameters are stored in the model file, e.g. the columns used. This ensures that the same columns used in constructing the SOM are used when finding the winning SOM cell with the test data. Two additional files are also written out: cellid_output outputs the ‘winning’ SOM cell for each photometric galaxy, in both raveled and 2D SOM cell coordinates. If the objectID or galaxy_id is present they will also be included in this file, if not the coordinates will be written in the same order in which the data is read in. uncovered_cell_file outputs the raveled cell IDs of cells that contain photometric galaxies but no corresponding spectroscopic objects, these objects should be removed from the sample as they cannot be accounted for properly in the summarizer. Some iteration on data cuts may be necessary to remove/mitigate these ‘uncovered’ objects.

Parameters:
  • zmin (float) – min redshift for z grid

  • zmax (float) – max redshift for z grid

  • nzbins (int) – number of bins in z grid

  • hdf5_groupname (str) – hdf5 group name for photometric data, set to “” if data is at top leve of hdf5 file

  • spec_groupname (str) – hdf5 group name for spectroscopic data, set to “” if data is at top leve of hdf5 file

  • phot_weightcol (str) – name of photometric weight column. If no weights are to be used, set to ‘’

  • spec_weightcol (str) – column name of the spectroscopic weight column. If no weights are to be used, set to ‘’

  • nsamples (int) – number of bootstrap spec-z samples to generate

  • n_clusters (int) – number of hierarchical clusters of the SOM cells. If not given, the SOM will not be grouped into clusters (or equivalently n_cluster=the total number of SOM cells.)

Returns:

qp_ens – ensemble of bootstrap realizations of the estimated N(z) for the input photometric data

Return type:

qp Ensemble

config_options = {'chunk_size': 10000, 'hdf5_groupname': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'mag_limits': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'n_clusters': <ceci.config.StageParameter object>, 'nondetect_val': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'nsamples': <ceci.config.StageParameter object>, 'nzbins': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'objid_name': <ceci.config.StageParameter object>, 'output_mode': <ceci.config.StageParameter object>, 'phot_weightcol': <ceci.config.StageParameter object>, 'redshift_col': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'redshift_colname': <ceci.config.StageParameter object>, 'seed': <ceci.config.StageParameter object>, 'spec_groupname': <ceci.config.StageParameter object>, 'spec_weightcol': <ceci.config.StageParameter object>, 'split': <ceci.config.StageParameter object>, 'zmax': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}, 'zmin': {'bands': ['mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_y_lsst'], 'dz': 0.01, 'err_bands': ['mag_err_u_lsst', 'mag_err_g_lsst', 'mag_err_r_lsst', 'mag_err_i_lsst', 'mag_err_z_lsst', 'mag_err_y_lsst'], 'hdf5_groupname': 'photometry', 'mag_limits': {'mag_g_lsst': 29.04, 'mag_i_lsst': 28.62, 'mag_r_lsst': 29.06, 'mag_u_lsst': 27.79, 'mag_y_lsst': 27.05, 'mag_z_lsst': 27.98}, 'nondetect_val': 99.0, 'nzbins': 301, 'redshift_col': 'redshift', 'ref_band': 'mag_i_lsst', 'zmax': 3.0, 'zmin': 0.0}}
get_som_coordinates(data, weight_col)[source]
name = 'somocluSOMSummarizer'
outputs = [('output', <class 'rail.core.data.QPHandle'>), ('single_NZ', <class 'rail.core.data.QPHandle'>), ('cellid_output', <class 'rail.core.data.Hdf5Handle'>), ('uncovered_cluster_file', <class 'rail.core.data.TableHandle'>)]
replace_non_detections(data)[source]
run()[source]

Run the stage and return the execution status

set_weight_column(data, weight_col)[source]