medcat.utils.saving.serializer
This modlue is responsible for the (new) methods of saving and loading parts of MedCAT.
The idea is to move away from saving medcat files using the dill/pickle. And to save as well as load them in some other way.
Module Contents
Classes
JSON serializer with set comprehension. |
|
A (potentially) semi-JSON based serializer for CDB. |
Attributes
- medcat.utils.saving.serializer.logger
- medcat.utils.saving.serializer.__SPECIALITY_NAMES_CUI
- medcat.utils.saving.serializer.__SPECIALITY_NAMES_NAME
- medcat.utils.saving.serializer.__SPECIALITY_NAMES_OTHER
- medcat.utils.saving.serializer.ONE2MANY
- medcat.utils.saving.serializer.SPECIALITY_NAMES
- class medcat.utils.saving.serializer.JsonSetSerializer(folder, name)
JSON serializer with set comprehension.
This serializer allows serializing and deserializing sets through JSON
- Parameters:
folder (str) –
name (str) –
- __init__(folder, name)
- Parameters:
folder (str) –
name (str) –
- Return type:
None
- write(d)
Write the specified dictionary to the this serializer’s file.
- Parameters:
d (dict) – The dict to write on file.
- Return type:
None
- read()
Read the json file specified by this serializer.
- Returns:
dict – The dict represented by this json file.
- Return type:
dict
- class medcat.utils.saving.serializer.CDBSerializer(main_path, json_path=None)
A (potentially) semi-JSON based serializer for CDB.
The parts that take up the most space within a CDB can be saved in JSON files. That is the following attributes of a CDB:
name2cuis
name2cuis2status
snames
cui2names
cui2snames
cui2type_ids
name_isupper
addl_info
These are specified at the top of the module (in SPECIALITY_NAMES).
The rest of the information (i.e config and other less memory intensive parts) will still be saved using dill like they have been before.
The objects of this class can be used for both serializing as well as deserializing. If the json_path parameter is passed, the JSON (de)serialization will be performed.
- Parameters:
main_path (str) – The path for the main part (i.e config and other less memory intensive parts)
json_path (str, optional) – The JSON. Defaults to None.
- __init__(main_path, json_path=None)
- Parameters:
main_path (str) –
json_path (Optional[str]) –
- Return type:
None
- serialize(cdb, overwrite=False)
Used to dump CDB to a file or or multiple files.
If json_path was specified to the constructor, this will serialize some of the parts that take up more memory in JSON files in said directory. In that case, the rest of the info is saved into the main_path passed to the consturctor Otherwise, everything is saved to the main_path using dill.dump just like in previous cases.
- Parameters:
cdb (CDB) – The context database (CDB)
overwrite (bool) – Whether to allow overwriting existing files. Defaults to False.
- Raises:
ValueError – If file(s) exist(s) and overwrite if False
- Return type:
None
- deserialize(cdb_cls)
Deserializes the json in the specified file info a CDB.
If the json_path was specified to the constructor, the JSON serialized files are used. Otherwise, everything is loaded from the main_path file.
- Parameters:
cdb_cls – CDB class.
- Returns:
CDB – The resulting CDB.