medcat.utils.checkpoint

Module Contents

Classes

Checkpoint

The base class of checkpoint objects

CheckpointConfig

CheckpointManager

The class for managing checkpoints of specific training type and their configuration

Attributes

T

logger

medcat.utils.checkpoint.T
medcat.utils.checkpoint.logger
class medcat.utils.checkpoint.Checkpoint(dir_path, *, steps=DEFAULT_STEP, max_to_keep=DEFAULT_MAX_TO_KEEP)

Bases: object

The base class of checkpoint objects

Parameters:
  • dir_path (str) – The path to the parent directory of checkpoint files.

  • steps (int) – The number of processed sentences/documents before a checkpoint is saved (N.B.: A small number could result in error “no space left on device”),

  • max_to_keep (int) – The maximum number of checkpoints to keep (N.B.: A large number could result in error “no space left on device”).

property steps: int
Return type:

int

property max_to_keep: int
Return type:

int

property count: int
Return type:

int

property dir_path: str
Return type:

str

DEFAULT_STEP = 1000
DEFAULT_MAX_TO_KEEP = 1
__init__(dir_path, *, steps=DEFAULT_STEP, max_to_keep=DEFAULT_MAX_TO_KEEP)
Parameters:
  • dir_path (str) –

  • steps (int) –

  • max_to_keep (int) –

Return type:

None

classmethod from_latest(dir_path)

Retrieve the latest checkpoint from the parent directory.

Parameters:

dir_path (str) – The path to the directory containing checkpoint files.

Returns:

T – A new checkpoint object.

Raises:

Exception – If no checkpoint is found.

Return type:

T

save(cdb, count)

Save the CDB as the latest checkpoint.

Parameters:
  • cdb (CDB) – The MedCAT CDB object to be checkpointed.

  • count (int) – The number of the finished steps.

Return type:

None

restore_latest_cdb()

Restore the CDB from the latest checkpoint.

Returns:

cdb (CDB) – The MedCAT CDB object.

Raises:

Exception – If no checkpoint is found.

Return type:

medcat.cdb.CDB

static _get_ckpt_file_paths(dir_path)
Parameters:

dir_path (str) –

Return type:

List[str]

static _get_steps_and_count(file_path)
Return type:

Tuple[int, int]

class medcat.utils.checkpoint.CheckpointConfig

Bases: object

output_dir: str = 'checkpoints'
steps: int
max_to_keep: int
class medcat.utils.checkpoint.CheckpointManager(name, checkpoint_config)

Bases: object

The class for managing checkpoints of specific training type and their configuration

Parameters:
__init__(name, checkpoint_config)
Parameters:
Return type:

None

create_checkpoint(dir_path=None)

Create a new checkpoint inside the checkpoint base directory.

Parameters:

dir_path (str) – The path to the checkpoint directory.

Returns:

CheckPoint – A checkpoint object.

Return type:

Checkpoint

get_latest_checkpoint(base_dir_path=None)

Retrieve the latest checkpoint from the checkpoint base directory.

Parameters:

base_dir_path (string) – The path to the directory containing checkpoint files.

Returns:

CheckPoint – A checkpoint object

Return type:

Checkpoint

classmethod get_latest_training_dir(base_dir_path)

Retrieve the latest training directory containing all checkpoints.

Parameters:

base_dir_path (str) – The path to the directory containing all checkpointed trainings.

Returns:

str – The path to the latest training directory containing all checkpoints.

Raises:

ValueError – If no checkpoint is found.

Return type:

str