:py:mod:`medcat.utils.memory_optimiser` ======================================= .. py:module:: medcat.utils.memory_optimiser Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.memory_optimiser._KeysView medcat.utils.memory_optimiser._ItemsView medcat.utils.memory_optimiser._ValuesView medcat.utils.memory_optimiser.DelegatingDict medcat.utils.memory_optimiser.DelegatingValueSet medcat.utils.memory_optimiser.DelegatingDictEncoder medcat.utils.memory_optimiser.DelegatingDictDecoder medcat.utils.memory_optimiser.DelegatingValueSetEncoder medcat.utils.memory_optimiser.DelegatingValueSetDecoder Functions ~~~~~~~~~ .. autoapisummary:: medcat.utils.memory_optimiser.attempt_fix_after_load medcat.utils.memory_optimiser.attempt_fix_snames_after_load medcat.utils.memory_optimiser._optimise medcat.utils.memory_optimiser._optimise_snames medcat.utils.memory_optimiser.perform_optimisation medcat.utils.memory_optimiser._attempt_fix_after_load medcat.utils.memory_optimiser._unoptimise medcat.utils.memory_optimiser._unoptimise_snames medcat.utils.memory_optimiser.unoptimise_cdb medcat.utils.memory_optimiser.map_to_many Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.memory_optimiser.CUI_DICT_NAMES_TO_COMBINE medcat.utils.memory_optimiser.ONE2MANY medcat.utils.memory_optimiser.NAME_DICT_NAMES_TO_COMBINE medcat.utils.memory_optimiser.NAME2MANY medcat.utils.memory_optimiser.DELEGATING_DICT_IDENTIFIER medcat.utils.memory_optimiser.DELEGATING_SET_IDENTIFIER medcat.utils.memory_optimiser.CUIS_PART medcat.utils.memory_optimiser.NAMES_PART medcat.utils.memory_optimiser.SNAMES_PART .. py:data:: CUI_DICT_NAMES_TO_COMBINE :value: ['cui2names', 'cui2snames', 'cui2context_vectors', 'cui2count_train', 'cui2tags',... .. py:data:: ONE2MANY :value: 'cui2many' .. py:data:: NAME_DICT_NAMES_TO_COMBINE :value: ['cui2names', 'name2cuis2status', 'cui2preferred_name'] .. py:data:: NAME2MANY :value: 'name2many' .. py:data:: DELEGATING_DICT_IDENTIFIER :value: '==DELEGATING_DICT==' .. py:data:: DELEGATING_SET_IDENTIFIER :value: '==DELEGATING_SET==' .. py:data:: CUIS_PART :value: 'CUIS' .. py:data:: NAMES_PART :value: 'NAMES' .. py:data:: SNAMES_PART :value: 'snames' .. py:class:: _KeysView(keys, parent) .. py:method:: __init__(keys, parent) .. py:method:: __iter__() .. py:method:: __len__() .. py:class:: _ItemsView(parent) .. py:method:: __init__(parent) .. py:method:: __iter__() .. py:method:: __len__() .. py:class:: _ValuesView(parent) .. py:method:: __init__(parent) .. py:method:: __iter__() .. py:method:: __len__() .. py:class:: DelegatingDict(delegate, nr, nr_of_overall_items = 8) .. py:method:: __init__(delegate, nr, nr_of_overall_items = 8) .. py:method:: _generate_empty_entry() .. py:method:: __getitem__(key) .. py:method:: get(key, default) .. py:method:: __setitem__(key, value) .. py:method:: __contains__(key) .. py:method:: keys() .. py:method:: items() .. py:method:: values() .. py:method:: __iter__() .. py:method:: __len__() .. py:method:: to_dict() .. py:method:: __eq__(__value) Return self==value. .. py:method:: __hash__() Return hash(self). .. py:method:: __delitem__(key) .. py:method:: pop(key, default = None) .. py:class:: DelegatingValueSet(delegate) .. py:method:: __init__(delegate) .. py:method:: update(other) .. py:method:: __contains__(value) .. py:method:: to_dict() .. py:class:: DelegatingDictEncoder Bases: :py:obj:`medcat.utils.saving.coding.PartEncoder` Base class for protocol classes. Protocol classes are defined as:: class Proto(Protocol): def meth(self) -> int: ... Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:: class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:: class GenProto(Protocol[T]): def meth(self) -> T: ... .. py:method:: try_encode(obj) Try to encode an object :param obj: The object to encode :type obj: object :raises UnsuitableObject: If the object is unsuitable for encoding. :Returns: **Any** -- The encoded object .. py:class:: DelegatingDictDecoder Bases: :py:obj:`medcat.utils.saving.coding.PartDecoder` Base class for protocol classes. Protocol classes are defined as:: class Proto(Protocol): def meth(self) -> int: ... Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:: class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:: class GenProto(Protocol[T]): def meth(self) -> T: ... .. py:method:: try_decode(dct) Try to decode the dictionary. :param dct: The dict to decode. :type dct: dict :Returns: **Union[dict, Any]** -- The dict if unable to decode, the decoded object otherwise .. py:class:: DelegatingValueSetEncoder Bases: :py:obj:`medcat.utils.saving.coding.PartEncoder` Base class for protocol classes. Protocol classes are defined as:: class Proto(Protocol): def meth(self) -> int: ... Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:: class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:: class GenProto(Protocol[T]): def meth(self) -> T: ... .. py:method:: try_encode(obj) Try to encode an object :param obj: The object to encode :type obj: object :raises UnsuitableObject: If the object is unsuitable for encoding. :Returns: **Any** -- The encoded object .. py:class:: DelegatingValueSetDecoder Bases: :py:obj:`medcat.utils.saving.coding.PartDecoder` Base class for protocol classes. Protocol classes are defined as:: class Proto(Protocol): def meth(self) -> int: ... Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:: class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:: class GenProto(Protocol[T]): def meth(self) -> T: ... .. py:method:: try_decode(dct) Try to decode the dictionary. :param dct: The dict to decode. :type dct: dict :Returns: **Union[dict, Any]** -- The dict if unable to decode, the decoded object otherwise .. py:function:: attempt_fix_after_load(cdb) .. py:function:: attempt_fix_snames_after_load(cdb, snames_attr_name = 'snames') .. py:function:: _optimise(cdb, to_many_name, dict_names_to_combine) .. py:function:: _optimise_snames(cdb, cui2snames = 'cui2snames', snames_attr = 'snames') Optimise the snames part of a CDB. :param cdb: The CDB to optimise snames on. :type cdb: CDB :param cui2snames: The cui2snames dict name to delegate to. Defaults to 'cui2snames'. :type cui2snames: str :param snames_attr: The `snames` attribute name. Defaults to 'snames'. :type snames_attr: str .. py:function:: perform_optimisation(cdb, optimise_cuis = True, optimise_names = False, optimise_snames = True) Attempts to optimise the memory footprint of the CDB. This can perform optimisation for cui2<...> and name2<...> dicts. However, by default, only cui2many optimisation will be done. This is because at the time of writing, there were not enough name2<...> dicts to be able to benefit from the optimisation. Does so by unifying the following dicts: cui2names (Dict[str, Set[str]]): From cui to all names assigned to it. Mainly used for subsetting (maybe even only). cui2snames (Dict[str, Set[str]]): From cui to all sub-names assigned to it. Only used for subsetting. cui2context_vectors (Dict[str, Dict[str, np.array]]): From cui to a dictionary of different kinds of context vectors. Normally you would have here a short and a long context vector - they are calculated separately. cui2count_train (Dict[str, int]): From CUI to the number of training examples seen. cui2tags (Dict[str, List[str]]): From CUI to a list of tags. This can be used to tag concepts for grouping of whatever. cui2type_ids (Dict[str, Set[str]]): From CUI to type id (e.g. TUI in UMLS). cui2preferred_name (Dict[str, str]): From CUI to the preferred name for this concept. cui2average_confidence (Dict[str, str]): Used for dynamic thresholding. Holds the average confidence for this CUI given the training examples. name2cuis (Dict[str, List[str]]): Map for concept name to CUIs - one name can map to multiple CUIs. name2cuis2status (Dict[str, Dict[str, str]]): What is the status for a given name and cui pair - each name can be: P - Preferred, A - Automatic (e.g. let medcat decide), N - Not common. name2count_train (Dict[str, str]): Counts how often did a name appear during training. It can also delegate the `snames` set to use the various sets in `cui2snames` instead. They will all be included in 1 dict with CUI keys and a list of values for each pre-existing dict. :param cdb: The CDB to modify. :type cdb: CDB :param optimise_cuis: Whether to optimise cui2<...> dicts. Defaults to True. :type optimise_cuis: bool :param optimise_names: Whether to optimise name2<...> dicts. Defaults to False. :type optimise_names: bool :param optimise_snames: Whether to optimise `snames` set. Defaults to True. :type optimise_snames: bool .. py:function:: _attempt_fix_after_load(cdb, one2many_name, dict_names) .. py:function:: _unoptimise(cdb, to_many_name, dict_names_to_combine) .. py:function:: _unoptimise_snames(cdb, cui2snames = 'cui2snames', snames_attr = 'snames') .. py:function:: unoptimise_cdb(cdb) This undoes all the (potential) memory optimisations done in `perform_optimisation`. This method relies on `CDB._memory_optimised_parts` to be up to date. :param cdb: The CDB to work on. :type cdb: CDB .. py:function:: map_to_many(dicts)