medcat.utils.cdb_state

Module Contents

Functions

copy_cdb_state(cdb)

Creates a (deep) copy of the CDB state.

save_cdb_state(cdb, file_path)

Saves CDB state in a file.

apply_cdb_state(cdb, state)

Apply the specified state to the specified CDB.

load_and_apply_cdb_state(cdb, file_path)

Delete current CDB state and apply CDB state from file.

captured_state_cdb(cdb[, save_state_to_disk])

A context manager that captures and re-applies the initial CDB state.

in_memory_state_capture(cdb)

Capture the CDB state in memory.

on_disk_memory_capture(cdb)

Capture the CDB state in a temporary file.

Attributes

logger

CDBState

CDB State.

medcat.utils.cdb_state.logger
medcat.utils.cdb_state.CDBState

CDB State.

This is a dictionary of the parts of the CDB that change during (supervised) training. It can be used to store and restore the state of a CDB after modifying it.

Currently, the following fields are saved:
  • name2cuis

  • snames

  • cui2names

  • cui2snames

  • cui2context_vectors

  • cui2count_train

  • name_isupper

  • vocab

medcat.utils.cdb_state.copy_cdb_state(cdb)

Creates a (deep) copy of the CDB state.

Grabs the fields that correspond to the state, creates deep copies, and returns the copies.

Parameters:

cdb – The CDB from which to grab the state.

Returns:

CDBState – The copied state.

Return type:

CDBState

medcat.utils.cdb_state.save_cdb_state(cdb, file_path)

Saves CDB state in a file.

Currently uses dill.dump to save the relevant fields/values.

Parameters:
  • cdb – The CDB from which to grab the state.

  • file_path (str) – The file to dump the state.

Return type:

None

medcat.utils.cdb_state.apply_cdb_state(cdb, state)

Apply the specified state to the specified CDB.

This overwrites the current state of the CDB with one provided.

Parameters:
  • cdb – The CDB to apply the state to.

  • state (CDBState) – The state to use.

Return type:

None

medcat.utils.cdb_state.load_and_apply_cdb_state(cdb, file_path)

Delete current CDB state and apply CDB state from file.

This first delets the current state of the CDB. This is to save memory. The idea is that saving the staet on disk will save on RAM usage. But it wouldn’t really work too well if upon load, two instances were still in memory.

Parameters:
  • cdb – The CDB to apply the state to.

  • file_path (str) – The file where the state has been saved to.

Return type:

None

medcat.utils.cdb_state.captured_state_cdb(cdb, save_state_to_disk=False)

A context manager that captures and re-applies the initial CDB state.

The context manager captures/copies the initial state of the CDB when entering. It then allows the user to modify the state (i.e training). Upon exit re-applies the initial CDB state.

If RAM is an issue, it is recommended to use save_state_to_disk. Otherwise the copy of the original state will be held in memory. If saved on disk, a temporary file is used and removed afterwards.

Parameters:
  • cdb – The CDB to use.

  • save_state_to_disk (bool) – Whether to save state on disk or hold in in memory. Defaults to False.

Yields:

None

medcat.utils.cdb_state.in_memory_state_capture(cdb)

Capture the CDB state in memory.

Parameters:

cdb – The CDB to use.

Yields:

None

medcat.utils.cdb_state.on_disk_memory_capture(cdb)

Capture the CDB state in a temporary file.

Parameters:

cdb – The CDB to use

Yields:

None