rail.tools.table_tools module

Stages that implement utility functions

class rail.tools.table_tools.ColumnMapper

Bases: RailStage

Utility stage that remaps the names of columns.

  1. This operates on pandas dataframs in parquet files.

2. In short, this does: output_data = input_data.rename(columns=self.config.columns, in_place=self.config.in_place)

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • columns ([dict] (required)) – Map of columns to rename

  • in_place ([bool] default=False) – Update file in place

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'column_mapper'
name = 'ColumnMapper'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.tools.table_tools.RowSelector

Bases: RailStage

Utility Stage that sub-selects rows from a table by index

  1. This operates on pandas dataframs in parquet files.

2. In short, this does: output_data = input_data[self.config.start_row:self.config.stop_row]

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • start_row ([int] (required)) – starting row number

  • stop_row ([int] (required)) – Stoppig row number

  • input (PqHandle (INPUT))

  • output (PqHandle (OUTPUT))

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'row_selector'
name = 'RowSelector'
outputs = [('output', <class 'rail.core.data.PqHandle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None
class rail.tools.table_tools.TableConverter

Bases: RailStage

Utility stage that converts tables from one format to anothe

FIXME, this is hardwired to convert parquet tables to Hdf5Tables. It would be nice to have more options here.

Parameters:
  • output_mode ([str] default=default) – What to do with the outputs. The options are ‘default’, where outputs will be written to files and some returned, and ‘return’, where outputs will only be returned and not written.

  • output_format ([str] (required)) – Format of output table

  • input (PqHandle (INPUT))

  • output (Hdf5Handle (OUTPUT))

entrypoint_function: str | None = '__call__'
inputs = [('input', <class 'rail.core.data.PqHandle'>)]
interactive_function: str | None = 'table_converter'
name = 'TableConverter'
outputs = [('output', <class 'rail.core.data.Hdf5Handle'>)]
run()

Run the stage and return the execution status.

Subclasses must implemented this method.

Return type:

None

stage_columns: list[str] | None