:py:mod:`medcat.utils.saving.serializer` ======================================== .. py:module:: medcat.utils.saving.serializer .. autoapi-nested-parse:: This modlue is responsible for the (new) methods of saving and loading parts of MedCAT. The idea is to move away from saving medcat files using the dill/pickle. And to save as well as load them in some other way. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.saving.serializer.JsonSetSerializer medcat.utils.saving.serializer.CDBSerializer Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.saving.serializer.logger medcat.utils.saving.serializer.__SPECIALITY_NAMES_CUI medcat.utils.saving.serializer.__SPECIALITY_NAMES_NAME medcat.utils.saving.serializer.__SPECIALITY_NAMES_OTHER medcat.utils.saving.serializer.ONE2MANY medcat.utils.saving.serializer.SPECIALITY_NAMES .. py:data:: logger .. py:data:: __SPECIALITY_NAMES_CUI .. py:data:: __SPECIALITY_NAMES_NAME .. py:data:: __SPECIALITY_NAMES_OTHER .. py:data:: ONE2MANY .. py:data:: SPECIALITY_NAMES .. py:class:: JsonSetSerializer(folder, name) JSON serializer with set comprehension. This serializer allows serializing and deserializing sets through JSON .. py:method:: __init__(folder, name) .. py:method:: write(d) Write the specified dictionary to the this serializer's file. :param d: The dict to write on file. :type d: dict .. py:method:: read() Read the json file specified by this serializer. :Returns: **dict** -- The dict represented by this json file. .. py:class:: CDBSerializer(main_path, json_path = None) A (potentially) semi-JSON based serializer for CDB. The parts that take up the most space within a CDB can be saved in JSON files. That is the following attributes of a CDB: - name2cuis - name2cuis2status - snames - cui2names - cui2snames - cui2type_ids - name_isupper - addl_info These are specified at the top of the module (in `SPECIALITY_NAMES`). The rest of the information (i.e config and other less memory intensive parts) will still be saved using dill like they have been before. The objects of this class can be used for both serializing as well as deserializing. If the `json_path` parameter is passed, the JSON (de)serialization will be performed. :param main_path: The path for the main part (i.e config and other less memory intensive parts) :type main_path: str :param json_path: The JSON. Defaults to None. :type json_path: str, optional .. py:method:: __init__(main_path, json_path = None) .. py:method:: serialize(cdb, overwrite = False) Used to dump CDB to a file or or multiple files. If `json_path` was specified to the constructor, this will serialize some of the parts that take up more memory in JSON files in said directory. In that case, the rest of the info is saved into the `main_path` passed to the consturctor Otherwise, everything is saved to the `main_path` using `dill.dump` just like in previous cases. :param cdb: The context database (CDB) :type cdb: CDB :param overwrite: Whether to allow overwriting existing files. Defaults to False. :type overwrite: bool :raises ValueError: If file(s) exist(s) and overwrite if `False` .. py:method:: deserialize(cdb_cls) Deserializes the json in the specified file info a CDB. If the `json_path` was specified to the constructor, the JSON serialized files are used. Otherwise, everything is loaded from the `main_path` file. :param cdb_cls: CDB class. :Returns: **CDB** -- The resulting CDB.