medcat.utils.saving.serializer

This modlue is responsible for the (new) methods of saving and loading parts of MedCAT.

The idea is to move away from saving medcat files using the dill/pickle. And to save as well as load them in some other way.

Module Contents

Classes

JsonSetSerializer

JSON serializer with set comprehension.

CDBSerializer

A (potentially) semi-JSON based serializer for CDB.

Attributes

logger

__SPECIALITY_NAMES_CUI

__SPECIALITY_NAMES_NAME

__SPECIALITY_NAMES_OTHER

ONE2MANY

SPECIALITY_NAMES

medcat.utils.saving.serializer.logger
medcat.utils.saving.serializer.__SPECIALITY_NAMES_CUI
medcat.utils.saving.serializer.__SPECIALITY_NAMES_NAME
medcat.utils.saving.serializer.__SPECIALITY_NAMES_OTHER
medcat.utils.saving.serializer.ONE2MANY
medcat.utils.saving.serializer.SPECIALITY_NAMES
class medcat.utils.saving.serializer.JsonSetSerializer(folder, name)

JSON serializer with set comprehension.

This serializer allows serializing and deserializing sets through JSON

Parameters:
  • folder (str) –

  • name (str) –

__init__(folder, name)
Parameters:
  • folder (str) –

  • name (str) –

Return type:

None

write(d)

Write the specified dictionary to the this serializer’s file.

Parameters:

d (dict) – The dict to write on file.

Return type:

None

read()

Read the json file specified by this serializer.

Returns:

dict – The dict represented by this json file.

Return type:

dict

class medcat.utils.saving.serializer.CDBSerializer(main_path, json_path=None)

A (potentially) semi-JSON based serializer for CDB.

The parts that take up the most space within a CDB can be saved in JSON files. That is the following attributes of a CDB:

  • name2cuis

  • name2cuis2status

  • snames

  • cui2names

  • cui2snames

  • cui2type_ids

  • name_isupper

  • addl_info

These are specified at the top of the module (in SPECIALITY_NAMES).

The rest of the information (i.e config and other less memory intensive parts) will still be saved using dill like they have been before.

The objects of this class can be used for both serializing as well as deserializing. If the json_path parameter is passed, the JSON (de)serialization will be performed.

Parameters:
  • main_path (str) – The path for the main part (i.e config and other less memory intensive parts)

  • json_path (str, optional) – The JSON. Defaults to None.

__init__(main_path, json_path=None)
Parameters:
  • main_path (str) –

  • json_path (Optional[str]) –

Return type:

None

serialize(cdb, overwrite=False)

Used to dump CDB to a file or or multiple files.

If json_path was specified to the constructor, this will serialize some of the parts that take up more memory in JSON files in said directory. In that case, the rest of the info is saved into the main_path passed to the consturctor Otherwise, everything is saved to the main_path using dill.dump just like in previous cases.

Parameters:
  • cdb (CDB) – The context database (CDB)

  • overwrite (bool) – Whether to allow overwriting existing files. Defaults to False.

Raises:

ValueError – If file(s) exist(s) and overwrite if False

Return type:

None

deserialize(cdb_cls)

Deserializes the json in the specified file info a CDB.

If the json_path was specified to the constructor, the JSON serialized files are used. Otherwise, everything is loaded from the main_path file.

Parameters:

cdb_cls – CDB class.

Returns:

CDB – The resulting CDB.