rail.core.data module

Rail-specific data management

class rail.core.data.DataHandle

Bases: object

Class to act as a handle for a bit of data. Associating it with a file and providing tools to read & write it to that file

__init__(tag, data=None, path=None, creator=None)

Constructor

Parameters:
  • tag (str) – The tag under which this data handle can be found in the store

  • data (DataLike | None) – The associated data

  • path (str | None) – The path to the associated file

  • creator (str | None) – The name of the stage that created this data handle

Return type:

None

close(**kwargs)

Close the associated file

Parameters:

kwargs (Any)

Return type:

None

data_size(**kwargs)

Return the size of the in memory data

Parameters:

kwargs (Any)

Return type:

int

fileObj: FileLike
finalize_write(**kwargs)

Finalize and close file written by chunks

Parameters:

**kwargs (Any) – Passed to call to write this chunk of data

Return type:

None

classmethod get_sub_class(class_name)

Get a particular subclass by name

Parameters:

class_name (str)

Return type:

type[DataHandle]

classmethod get_sub_classes()

Get all the subclasses

Return type:

dict[str, type[DataHandle]]

groups: GroupLike
property has_data: bool

Return true if the data for this handle are loaded

property has_path: bool

Return true if the path for the associated file is defined

initialize_write(data_length, **kwargs)

Initialize file to be written by chunks

Parameters:
  • data_length (int) – Number of rows of data that we will write, used to reserve space

  • **kwargs (Any) – Information about the columns we will write

Return type:

None

interactive_type = 'Data for RAIL'
property is_written: bool

Return true if the associated file has been written

iterator(**kwargs)

Iterator over the data

Parameters:

kwargs (Any)

Return type:

Iterable

length: int | None
classmethod make_name(tag)

Construct and return file name for a particular data tag

Parameters:

tag (str)

Return type:

str

open(**kwargs)

Open and return the associated file

Parameters:

**kwargs (Any) – Passed to the call to open the file in question

Returns:

Newly opened file

Return type:

FileLike

Notes

This will simply open the file and return a FileLike object to the caller. It will not read or cache the data

partial: bool | None
classmethod print_sub_classes()

Print the list of all the subclasses

Return type:

None

read(force=False, **kwargs)

Read and return the data from the associated file

Parameters:
  • force (bool) – If true, force re-reading the data

  • **kwargs (Any) – Passed to the call to read the data

Returns:

Data that were read

Return type:

DataLike

Notes

This will read the entire file, and while useful for testing on small files, will not work on very large files.

set_data(data, partial=False)

Set the data for a chunk, and set the partial flag to true

Parameters:
  • data (rail.core.data.DataLike)

  • partial (bool)

Return type:

None

size(**kwargs)

Return the size of the data associated to this handle

Parameters:

kwargs (Any)

Return type:

int

suffix: str | None = ''
write(**kwargs)

Write the data to the associated file

Parameters:

kwargs (Any)

Return type:

None

write_chunk(start, end, **kwargs)

Write the data to the associated file

Parameters:
  • start (int) – Index of starting row for this chunk of data

  • end (int) – Index of ending row for this chunk of data

  • **kwargs (Any) – Passed to call to write this chunk of data

Return type:

None

class rail.core.data.DataStore

Bases: dict

Class to provide a transient data store

This class:

  1. associates data products with keys

  2. provides functions to read and write the various data produces to associated files

__init__(**kwargs)

Build from keywords

Note

All of the values must be data handles or this will raise a TypeError

Parameters:

kwargs (Any)

Return type:

None

add_data(key, data, handle_class, path=None, creator='DataStore')

Create a handle for some data, and insert it into the DataStore

Parameters:
  • key (str)

  • data (rail.core.data.DataLike)

  • handle_class (type[DataHandle])

  • path (str | None)

  • creator (str)

Return type:

DataHandle

add_handle(key, handle_class, path, creator='DataStore')

Create a handle for some data, and insert it into the DataStore

Parameters:
  • key (str)

  • handle_class (type[DataHandle])

  • path (str)

  • creator (str)

Return type:

DataHandle

allow_overwrite = False
open(key, mode='r', **kwargs)

Open and return the file associated to a particular key

Parameters:
  • key (str)

  • mode (str)

  • kwargs (Any)

Return type:

rail.core.data.FileLike

read(key, force=False, **kwargs)

Read the data associated to a particular key

Parameters:
  • key (str)

  • force (bool)

  • kwargs (Any)

Return type:

rail.core.data.DataLike

read_file(key, handle_class, path, creator='DataStore', **kwargs)

Create a handle, use it to read a file, and insert it into the DataStore

Parameters:
  • key (str)

  • handle_class (type[DataHandle])

  • path (str)

  • creator (str)

  • kwargs (Any)

Return type:

DataHandle

write(key, **kwargs)

Write the data associated to a particular key

Parameters:
  • key (str)

  • kwargs (Any)

Return type:

None

write_all(force=False, **kwargs)

Write all the data in this DataStore

Parameters:
  • force (bool)

  • kwargs (Any)

Return type:

None

class rail.core.data.FitsHandle

Bases: TableHandle

DataHandle for a table written to fits

fileObj: FileLike
groups: GroupLike
length: int | None
partial: bool | None
suffix: str | None = 'fits'
class rail.core.data.Hdf5Handle

Bases: TableHandle

DataHandle for a table written to HDF5

fileObj: FileLike
groups: GroupLike
interactive_type = 'dict'
length: int | None
partial: bool | None
suffix: str | None = 'hdf5'
class rail.core.data.ModelDict

Bases: dict

A specialized dict to keep track of individual estimation models objects: this is just a dict these additional features

  1. Keys are paths

  2. There is a read(path, force=False) method that reads a model object and

    inserts it into the dictionary

  3. There is a single static instance of this class

open(path, mode, **kwargs)

Open the file and return the file handle

Parameters:
  • path (str)

  • mode (str)

  • kwargs (Any)

Return type:

rail.core.data.FileLike

read(path, force=False, reader=None, **_kwargs)

Read a model into this dict

Parameters:
  • path (str)

  • force (bool)

  • reader (Callable | None)

  • _kwargs (Any)

Return type:

rail.core.data.ModelLike

write(model, path, force=False, writer=None, **_kwargs)

Write the model, this default implementation uses pickle

Parameters:
  • model (rail.core.data.ModelLike)

  • path (str)

  • force (bool)

  • writer (Callable | None)

  • _kwargs (Any)

Return type:

None

class rail.core.data.ModelHandle

Bases: DataHandle

DataHandle for machine learning models

fileObj: FileLike
groups: GroupLike
interactive_type = 'numpy.ndarray'
length: int | None
partial: bool | None
suffix: str | None = 'pkl'
class rail.core.data.PqHandle

Bases: TableHandle

DataHandle for a parquet table

fileObj: FileLike
groups: GroupLike
interactive_type = 'pandas.core.frame.DataFrame'
length: int | None
partial: bool | None
suffix: str | None = 'pq'
class rail.core.data.QPDictHandle

Bases: DataHandle

DataHandle for dictionaries of qp ensembles

fileObj: FileLike
groups: GroupLike
length: int | None
partial: bool | None
suffix: str | None = 'hdf5'
class rail.core.data.QPHandle

Bases: DataHandle

DataHandle for qp ensembles

fileObj: FileLike
groups: GroupLike
interactive_type = 'qp.core.ensemble.Ensemble'
length: int | None
partial: bool | None
suffix: str | None = 'hdf5'
class rail.core.data.QPOrTableHandle

Bases: QPHandle, Hdf5Handle

DataHandle that will work with either qp.Ensembles or TableLike data

class PdfOrValue

Bases: Enum

both = 2
distribution = 0
has_dist()
Return type:

bool

has_point()
Return type:

bool

point_estimate = 1
unknown = -1
check_pdf_or_point()

Check the associated file to see if it is a QP pdf, point estimate or both

Return type:

PdfOrValue

fileObj: FileLike
groups: GroupLike
is_qp()

Check if the associated data or file is a QP ensemble

Return type:

bool

length: int | None
partial: bool | None
suffix: str | None = 'hdf5'
class rail.core.data.TableHandle

Bases: DataHandle

DataHandle for single tables of data

fileObj: FileLike
groups: GroupLike
interactive_type = 'A tablesio-compatible table'
length: int | None
partial: bool | None
set_data(data, partial=False)

Set the data for a chunk, and set the partial flag if this is not all the data

Parameters:
  • data (rail.core.data.TableLike)

  • partial (bool)

Return type:

None

suffix: str | None = None
rail.core.data.default_model_read(modelfile)

Default function to read model files, simply used pickle.load

Parameters:

modelfile (str)

Return type:

rail.core.data.ModelLike

rail.core.data.default_model_write(model, path)

Write the model, this default implementation uses pickle

Parameters:
  • model (rail.core.data.ModelLike)

  • path (str)

Return type:

None