rail.core.data module
Rail-specific data management
- class rail.core.data.DataHandle
Bases:
objectClass to act as a handle for a bit of data. Associating it with a file and providing tools to read & write it to that file
- __init__(tag, data=None, path=None, creator=None)
Constructor
- Parameters:
tag (str) – The tag under which this data handle can be found in the store
data (DataLike | None) – The associated data
path (str | None) – The path to the associated file
creator (str | None) – The name of the stage that created this data handle
- Return type:
None
- close(**kwargs)
Close the associated file
- Parameters:
kwargs (Any)
- Return type:
None
- data_size(**kwargs)
Return the size of the in memory data
- Parameters:
kwargs (Any)
- Return type:
int
- fileObj: FileLike
- finalize_write(**kwargs)
Finalize and close file written by chunks
- Parameters:
**kwargs (Any) – Passed to call to write this chunk of data
- Return type:
None
- classmethod get_sub_class(class_name)
Get a particular subclass by name
- Parameters:
class_name (str)
- Return type:
type[DataHandle]
- classmethod get_sub_classes()
Get all the subclasses
- Return type:
dict[str, type[DataHandle]]
- groups: GroupLike
- property has_data: bool
Return true if the data for this handle are loaded
- property has_path: bool
Return true if the path for the associated file is defined
- initialize_write(data_length, **kwargs)
Initialize file to be written by chunks
- Parameters:
data_length (int) – Number of rows of data that we will write, used to reserve space
**kwargs (Any) – Information about the columns we will write
- Return type:
None
- interactive_type = 'Data for RAIL'
- property is_written: bool
Return true if the associated file has been written
- iterator(**kwargs)
Iterator over the data
- Parameters:
kwargs (Any)
- Return type:
Iterable
- length: int | None
- classmethod make_name(tag)
Construct and return file name for a particular data tag
- Parameters:
tag (str)
- Return type:
str
- open(**kwargs)
Open and return the associated file
- Parameters:
**kwargs (Any) – Passed to the call to open the file in question
- Returns:
Newly opened file
- Return type:
FileLike
Notes
This will simply open the file and return a FileLike object to the caller. It will not read or cache the data
- partial: bool | None
- classmethod print_sub_classes()
Print the list of all the subclasses
- Return type:
None
- read(force=False, **kwargs)
Read and return the data from the associated file
- Parameters:
force (bool) – If true, force re-reading the data
**kwargs (Any) – Passed to the call to read the data
- Returns:
Data that were read
- Return type:
DataLike
Notes
This will read the entire file, and while useful for testing on small files, will not work on very large files.
- set_data(data, partial=False)
Set the data for a chunk, and set the partial flag to true
- Parameters:
data (rail.core.data.DataLike)
partial (bool)
- Return type:
None
- size(**kwargs)
Return the size of the data associated to this handle
- Parameters:
kwargs (Any)
- Return type:
int
- suffix: str | None = ''
- write(**kwargs)
Write the data to the associated file
- Parameters:
kwargs (Any)
- Return type:
None
- write_chunk(start, end, **kwargs)
Write the data to the associated file
- Parameters:
start (int) – Index of starting row for this chunk of data
end (int) – Index of ending row for this chunk of data
**kwargs (Any) – Passed to call to write this chunk of data
- Return type:
None
- class rail.core.data.DataStore
Bases:
dictClass to provide a transient data store
This class:
associates data products with keys
provides functions to read and write the various data produces to associated files
- __init__(**kwargs)
Build from keywords
Note
All of the values must be data handles or this will raise a TypeError
- Parameters:
kwargs (Any)
- Return type:
None
- add_data(key, data, handle_class, path=None, creator='DataStore')
Create a handle for some data, and insert it into the DataStore
- Parameters:
key (str)
data (rail.core.data.DataLike)
handle_class (type[DataHandle])
path (str | None)
creator (str)
- Return type:
- add_handle(key, handle_class, path, creator='DataStore')
Create a handle for some data, and insert it into the DataStore
- Parameters:
key (str)
handle_class (type[DataHandle])
path (str)
creator (str)
- Return type:
- allow_overwrite = False
- open(key, mode='r', **kwargs)
Open and return the file associated to a particular key
- Parameters:
key (str)
mode (str)
kwargs (Any)
- Return type:
rail.core.data.FileLike
- read(key, force=False, **kwargs)
Read the data associated to a particular key
- Parameters:
key (str)
force (bool)
kwargs (Any)
- Return type:
rail.core.data.DataLike
- read_file(key, handle_class, path, creator='DataStore', **kwargs)
Create a handle, use it to read a file, and insert it into the DataStore
- Parameters:
key (str)
handle_class (type[DataHandle])
path (str)
creator (str)
kwargs (Any)
- Return type:
- write(key, **kwargs)
Write the data associated to a particular key
- Parameters:
key (str)
kwargs (Any)
- Return type:
None
- write_all(force=False, **kwargs)
Write all the data in this DataStore
- Parameters:
force (bool)
kwargs (Any)
- Return type:
None
- class rail.core.data.FitsHandle
Bases:
TableHandleDataHandle for a table written to fits
- fileObj: FileLike
- groups: GroupLike
- length: int | None
- partial: bool | None
- suffix: str | None = 'fits'
- class rail.core.data.Hdf5Handle
Bases:
TableHandleDataHandle for a table written to HDF5
- fileObj: FileLike
- groups: GroupLike
- interactive_type = 'dict'
- length: int | None
- partial: bool | None
- suffix: str | None = 'hdf5'
- class rail.core.data.ModelDict
Bases:
dictA specialized dict to keep track of individual estimation models objects: this is just a dict these additional features
Keys are paths
- There is a read(path, force=False) method that reads a model object and
inserts it into the dictionary
There is a single static instance of this class
- open(path, mode, **kwargs)
Open the file and return the file handle
- Parameters:
path (str)
mode (str)
kwargs (Any)
- Return type:
rail.core.data.FileLike
- read(path, force=False, reader=None, **_kwargs)
Read a model into this dict
- Parameters:
path (str)
force (bool)
reader (Callable | None)
_kwargs (Any)
- Return type:
rail.core.data.ModelLike
- write(model, path, force=False, writer=None, **_kwargs)
Write the model, this default implementation uses pickle
- Parameters:
model (rail.core.data.ModelLike)
path (str)
force (bool)
writer (Callable | None)
_kwargs (Any)
- Return type:
None
- class rail.core.data.ModelHandle
Bases:
DataHandleDataHandle for machine learning models
- fileObj: FileLike
- groups: GroupLike
- interactive_type = 'numpy.ndarray'
- length: int | None
- partial: bool | None
- suffix: str | None = 'pkl'
- class rail.core.data.PqHandle
Bases:
TableHandleDataHandle for a parquet table
- fileObj: FileLike
- groups: GroupLike
- interactive_type = 'pandas.core.frame.DataFrame'
- length: int | None
- partial: bool | None
- suffix: str | None = 'pq'
- class rail.core.data.QPDictHandle
Bases:
DataHandleDataHandle for dictionaries of qp ensembles
- fileObj: FileLike
- groups: GroupLike
- length: int | None
- partial: bool | None
- suffix: str | None = 'hdf5'
- class rail.core.data.QPHandle
Bases:
DataHandleDataHandle for qp ensembles
- fileObj: FileLike
- groups: GroupLike
- interactive_type = 'qp.core.ensemble.Ensemble'
- length: int | None
- partial: bool | None
- suffix: str | None = 'hdf5'
- class rail.core.data.QPOrTableHandle
Bases:
QPHandle,Hdf5HandleDataHandle that will work with either qp.Ensembles or TableLike data
- class PdfOrValue
Bases:
Enum- both = 2
- distribution = 0
- has_dist()
- Return type:
bool
- has_point()
- Return type:
bool
- point_estimate = 1
- unknown = -1
- check_pdf_or_point()
Check the associated file to see if it is a QP pdf, point estimate or both
- Return type:
- fileObj: FileLike
- groups: GroupLike
- is_qp()
Check if the associated data or file is a QP ensemble
- Return type:
bool
- length: int | None
- partial: bool | None
- suffix: str | None = 'hdf5'
- class rail.core.data.TableHandle
Bases:
DataHandleDataHandle for single tables of data
- fileObj: FileLike
- groups: GroupLike
- interactive_type = 'A tablesio-compatible table'
- length: int | None
- partial: bool | None
- set_data(data, partial=False)
Set the data for a chunk, and set the partial flag if this is not all the data
- Parameters:
data (rail.core.data.TableLike)
partial (bool)
- Return type:
None
- suffix: str | None = None
- rail.core.data.default_model_read(modelfile)
Default function to read model files, simply used pickle.load
- Parameters:
modelfile (str)
- Return type:
rail.core.data.ModelLike
- rail.core.data.default_model_write(model, path)
Write the model, this default implementation uses pickle
- Parameters:
model (rail.core.data.ModelLike)
path (str)
- Return type:
None