medcat.utils.memory_optimiser
Module Contents
Classes
Base class for protocol classes. |
|
Base class for protocol classes. |
|
Base class for protocol classes. |
|
Base class for protocol classes. |
Functions
|
|
|
|
|
Optimise the snames part of a CDB. |
|
Attempts to optimise the memory footprint of the CDB. |
|
|
|
|
|
|
|
This undoes all the (potential) memory optimisations done in perform_optimisation. |
|
Attributes
- medcat.utils.memory_optimiser.CUI_DICT_NAMES_TO_COMBINE = ['cui2names', 'cui2snames', 'cui2context_vectors', 'cui2count_train', 'cui2tags',...
- medcat.utils.memory_optimiser.ONE2MANY = 'cui2many'
- medcat.utils.memory_optimiser.NAME_DICT_NAMES_TO_COMBINE = ['cui2names', 'name2cuis2status', 'cui2preferred_name']
- medcat.utils.memory_optimiser.NAME2MANY = 'name2many'
- medcat.utils.memory_optimiser.DELEGATING_DICT_IDENTIFIER = '==DELEGATING_DICT=='
- medcat.utils.memory_optimiser.DELEGATING_SET_IDENTIFIER = '==DELEGATING_SET=='
- medcat.utils.memory_optimiser.CUIS_PART = 'CUIS'
- medcat.utils.memory_optimiser.NAMES_PART = 'NAMES'
- medcat.utils.memory_optimiser.SNAMES_PART = 'snames'
- class medcat.utils.memory_optimiser._KeysView(keys, parent)
- Parameters:
keys (KeysView) –
parent (DelegatingDict) –
- __init__(keys, parent)
- Parameters:
keys (KeysView) –
parent (DelegatingDict) –
- __iter__()
- Return type:
Iterator[Any]
- __len__()
- Return type:
int
- class medcat.utils.memory_optimiser._ItemsView(parent)
- Parameters:
parent (DelegatingDict) –
- __init__(parent)
- Parameters:
parent (DelegatingDict) –
- Return type:
None
- __iter__()
- Return type:
Iterator[Any]
- __len__()
- Return type:
int
- class medcat.utils.memory_optimiser._ValuesView(parent)
- Parameters:
parent (DelegatingDict) –
- __init__(parent)
- Parameters:
parent (DelegatingDict) –
- Return type:
None
- __iter__()
- Return type:
Iterator[Any]
- __len__()
- Return type:
int
- class medcat.utils.memory_optimiser.DelegatingDict(delegate, nr, nr_of_overall_items=8)
- Parameters:
delegate (Dict[str, List[Any]]) –
nr (int) –
nr_of_overall_items (int) –
- __init__(delegate, nr, nr_of_overall_items=8)
- Parameters:
delegate (Dict[str, List[Any]]) –
nr (int) –
nr_of_overall_items (int) –
- Return type:
None
- _generate_empty_entry()
- Return type:
List[Any]
- __getitem__(key)
- Parameters:
key (str) –
- Return type:
Any
- get(key, default)
- Parameters:
key (str) –
default (Any) –
- Return type:
Any
- __setitem__(key, value)
- Parameters:
key (str) –
value (Any) –
- Return type:
None
- __contains__(key)
- Parameters:
key (str) –
- Return type:
bool
- items()
- Return type:
- values()
- Return type:
- __iter__()
- Return type:
Iterator[str]
- __len__()
- Return type:
int
- to_dict()
- Return type:
dict
- __eq__(__value)
Return self==value.
- Parameters:
__value (object) –
- Return type:
bool
- __hash__()
Return hash(self).
- Return type:
int
- __delitem__(key)
- Parameters:
key (str) –
- Return type:
None
- pop(key, default=None)
- Parameters:
key (str) –
default (Optional[Any]) –
- Return type:
Any
- class medcat.utils.memory_optimiser.DelegatingValueSet(delegate)
- Parameters:
delegate (Dict[str, Set[str]]) –
- __init__(delegate)
- Parameters:
delegate (Dict[str, Set[str]]) –
- Return type:
None
- update(other)
- Parameters:
other (Any) –
- Return type:
None
- __contains__(value)
- Parameters:
value (str) –
- Return type:
bool
- to_dict()
- Return type:
dict
- class medcat.utils.memory_optimiser.DelegatingDictEncoder
Bases:
medcat.utils.saving.coding.PartEncoder
Base class for protocol classes.
Protocol classes are defined as:
class Proto(Protocol): def meth(self) -> int: ...
Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:
class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check
See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:
class GenProto(Protocol[T]): def meth(self) -> T: ...
- try_encode(obj)
Try to encode an object
- Parameters:
obj (object) – The object to encode
- Raises:
UnsuitableObject – If the object is unsuitable for encoding.
- Returns:
Any – The encoded object
- class medcat.utils.memory_optimiser.DelegatingDictDecoder
Bases:
medcat.utils.saving.coding.PartDecoder
Base class for protocol classes.
Protocol classes are defined as:
class Proto(Protocol): def meth(self) -> int: ...
Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:
class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check
See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:
class GenProto(Protocol[T]): def meth(self) -> T: ...
- try_decode(dct)
Try to decode the dictionary.
- Parameters:
dct (dict) – The dict to decode.
- Returns:
Union[dict, Any] – The dict if unable to decode, the decoded object otherwise
- Return type:
Union[dict, medcat.utils.saving.coding.EncodeableObject]
- class medcat.utils.memory_optimiser.DelegatingValueSetEncoder
Bases:
medcat.utils.saving.coding.PartEncoder
Base class for protocol classes.
Protocol classes are defined as:
class Proto(Protocol): def meth(self) -> int: ...
Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:
class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check
See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:
class GenProto(Protocol[T]): def meth(self) -> T: ...
- try_encode(obj)
Try to encode an object
- Parameters:
obj (object) – The object to encode
- Raises:
UnsuitableObject – If the object is unsuitable for encoding.
- Returns:
Any – The encoded object
- class medcat.utils.memory_optimiser.DelegatingValueSetDecoder
Bases:
medcat.utils.saving.coding.PartDecoder
Base class for protocol classes.
Protocol classes are defined as:
class Proto(Protocol): def meth(self) -> int: ...
Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:
class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check
See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:
class GenProto(Protocol[T]): def meth(self) -> T: ...
- try_decode(dct)
Try to decode the dictionary.
- Parameters:
dct (dict) – The dict to decode.
- Returns:
Union[dict, Any] – The dict if unable to decode, the decoded object otherwise
- Return type:
Union[dict, medcat.utils.saving.coding.EncodeableObject]
- medcat.utils.memory_optimiser.attempt_fix_after_load(cdb)
- Parameters:
cdb (medcat.cdb.CDB) –
- medcat.utils.memory_optimiser.attempt_fix_snames_after_load(cdb, snames_attr_name='snames')
- Parameters:
cdb (medcat.cdb.CDB) –
snames_attr_name (str) –
- medcat.utils.memory_optimiser._optimise(cdb, to_many_name, dict_names_to_combine)
- Parameters:
cdb (medcat.cdb.CDB) –
to_many_name (str) –
dict_names_to_combine (List[str]) –
- Return type:
None
- medcat.utils.memory_optimiser._optimise_snames(cdb, cui2snames='cui2snames', snames_attr='snames')
Optimise the snames part of a CDB.
- Parameters:
cdb (CDB) – The CDB to optimise snames on.
cui2snames (str) – The cui2snames dict name to delegate to. Defaults to ‘cui2snames’.
snames_attr (str) – The snames attribute name. Defaults to ‘snames’.
- Return type:
None
- medcat.utils.memory_optimiser.perform_optimisation(cdb, optimise_cuis=True, optimise_names=False, optimise_snames=True)
Attempts to optimise the memory footprint of the CDB.
This can perform optimisation for cui2<…> and name2<…> dicts. However, by default, only cui2many optimisation will be done. This is because at the time of writing, there were not enough name2<…> dicts to be able to benefit from the optimisation.
Does so by unifying the following dicts:
- cui2names (Dict[str, Set[str]]):
From cui to all names assigned to it. Mainly used for subsetting (maybe even only).
- cui2snames (Dict[str, Set[str]]):
From cui to all sub-names assigned to it. Only used for subsetting.
- cui2context_vectors (Dict[str, Dict[str, np.array]]):
From cui to a dictionary of different kinds of context vectors. Normally you would have here a short and a long context vector - they are calculated separately.
- cui2count_train (Dict[str, int]):
From CUI to the number of training examples seen.
- cui2tags (Dict[str, List[str]]):
From CUI to a list of tags. This can be used to tag concepts for grouping of whatever.
- cui2type_ids (Dict[str, Set[str]]):
From CUI to type id (e.g. TUI in UMLS).
- cui2preferred_name (Dict[str, str]):
From CUI to the preferred name for this concept.
- cui2average_confidence (Dict[str, str]):
Used for dynamic thresholding. Holds the average confidence for this CUI given the training examples.
- name2cuis (Dict[str, List[str]]):
Map fro concept name to CUIs - one name can map to multiple CUIs.
- name2cuis2status (Dict[str, Dict[str, str]]):
- What is the status for a given name and cui pair - each name can be:
P - Preferred, A - Automatic (e.g. let medcat decide), N - Not common.
- name2count_train (Dict[str, str]):
Counts how often did a name appear during training.
It can also delegate the snames set to use the various sets in cui2snames instead.
They will all be included in 1 dict with CUI keys and a list of values for each pre-existing dict.
- Parameters:
cdb (CDB) – The CDB to modify.
optimise_cuis (bool) – Whether to optimise cui2<…> dicts. Defaults to True.
optimise_names (bool) – Whether to optimise name2<…> dicts. Defaults to False.
optimise_snames (bool) – Whether to optimise snames set. Defaults to True.
- Return type:
None
- medcat.utils.memory_optimiser._attempt_fix_after_load(cdb, one2many_name, dict_names)
- Parameters:
cdb (medcat.cdb.CDB) –
one2many_name (str) –
dict_names (List[str]) –
- medcat.utils.memory_optimiser._unoptimise(cdb, to_many_name, dict_names_to_combine)
- Parameters:
cdb (medcat.cdb.CDB) –
to_many_name (str) –
dict_names_to_combine (List[str]) –
- medcat.utils.memory_optimiser._unoptimise_snames(cdb, cui2snames='cui2snames', snames_attr='snames')
- Parameters:
cdb (medcat.cdb.CDB) –
cui2snames (str) –
snames_attr (str) –
- Return type:
None
- medcat.utils.memory_optimiser.unoptimise_cdb(cdb)
This undoes all the (potential) memory optimisations done in perform_optimisation.
This method relies on CDB._memory_optimised_parts to be up to date.
- Parameters:
cdb (CDB) – The CDB to work on.
- medcat.utils.memory_optimiser.map_to_many(dicts)
- Parameters:
dicts (List[Dict[str, Any]]) –
- Return type:
Tuple[Dict[str, List[Any]], List[DelegatingDict]]