medcat.utils.memory_optimiser

Module Contents

Classes

_KeysView

_ItemsView

_ValuesView

DelegatingDict

DelegatingValueSet

DelegatingDictEncoder

Base class for protocol classes.

DelegatingDictDecoder

Base class for protocol classes.

DelegatingValueSetEncoder

Base class for protocol classes.

DelegatingValueSetDecoder

Base class for protocol classes.

Functions

attempt_fix_after_load(cdb)

attempt_fix_snames_after_load(cdb[, snames_attr_name])

_optimise(cdb, to_many_name, dict_names_to_combine)

_optimise_snames(cdb[, cui2snames, snames_attr])

Optimise the snames part of a CDB.

perform_optimisation(cdb[, optimise_cuis, ...])

Attempts to optimise the memory footprint of the CDB.

_attempt_fix_after_load(cdb, one2many_name, dict_names)

_unoptimise(cdb, to_many_name, dict_names_to_combine)

_unoptimise_snames(cdb[, cui2snames, snames_attr])

unoptimise_cdb(cdb)

This undoes all the (potential) memory optimisations done in perform_optimisation.

map_to_many(dicts)

Attributes

CUI_DICT_NAMES_TO_COMBINE

ONE2MANY

NAME_DICT_NAMES_TO_COMBINE

NAME2MANY

DELEGATING_DICT_IDENTIFIER

DELEGATING_SET_IDENTIFIER

CUIS_PART

NAMES_PART

SNAMES_PART

medcat.utils.memory_optimiser.CUI_DICT_NAMES_TO_COMBINE = ['cui2names', 'cui2snames', 'cui2context_vectors', 'cui2count_train', 'cui2tags',...
medcat.utils.memory_optimiser.ONE2MANY = 'cui2many'
medcat.utils.memory_optimiser.NAME_DICT_NAMES_TO_COMBINE = ['cui2names', 'name2cuis2status', 'cui2preferred_name']
medcat.utils.memory_optimiser.NAME2MANY = 'name2many'
medcat.utils.memory_optimiser.DELEGATING_DICT_IDENTIFIER = '==DELEGATING_DICT=='
medcat.utils.memory_optimiser.DELEGATING_SET_IDENTIFIER = '==DELEGATING_SET=='
medcat.utils.memory_optimiser.CUIS_PART = 'CUIS'
medcat.utils.memory_optimiser.NAMES_PART = 'NAMES'
medcat.utils.memory_optimiser.SNAMES_PART = 'snames'
class medcat.utils.memory_optimiser._KeysView(keys, parent)
Parameters:
__init__(keys, parent)
Parameters:
__iter__()
Return type:

Iterator[Any]

__len__()
Return type:

int

class medcat.utils.memory_optimiser._ItemsView(parent)
Parameters:

parent (DelegatingDict) –

__init__(parent)
Parameters:

parent (DelegatingDict) –

Return type:

None

__iter__()
Return type:

Iterator[Any]

__len__()
Return type:

int

class medcat.utils.memory_optimiser._ValuesView(parent)
Parameters:

parent (DelegatingDict) –

__init__(parent)
Parameters:

parent (DelegatingDict) –

Return type:

None

__iter__()
Return type:

Iterator[Any]

__len__()
Return type:

int

class medcat.utils.memory_optimiser.DelegatingDict(delegate, nr, nr_of_overall_items=8)
Parameters:
  • delegate (Dict[str, List[Any]]) –

  • nr (int) –

  • nr_of_overall_items (int) –

__init__(delegate, nr, nr_of_overall_items=8)
Parameters:
  • delegate (Dict[str, List[Any]]) –

  • nr (int) –

  • nr_of_overall_items (int) –

Return type:

None

_generate_empty_entry()
Return type:

List[Any]

__getitem__(key)
Parameters:

key (str) –

Return type:

Any

get(key, default)
Parameters:
  • key (str) –

  • default (Any) –

Return type:

Any

__setitem__(key, value)
Parameters:
  • key (str) –

  • value (Any) –

Return type:

None

__contains__(key)
Parameters:

key (str) –

Return type:

bool

keys()
Return type:

_KeysView

items()
Return type:

_ItemsView

values()
Return type:

_ValuesView

__iter__()
Return type:

Iterator[str]

__len__()
Return type:

int

to_dict()
Return type:

dict

__eq__(__value)

Return self==value.

Parameters:

__value (object) –

Return type:

bool

__hash__()

Return hash(self).

Return type:

int

__delitem__(key)
Parameters:

key (str) –

Return type:

None

pop(key, default=None)
Parameters:
  • key (str) –

  • default (Optional[Any]) –

Return type:

Any

class medcat.utils.memory_optimiser.DelegatingValueSet(delegate)
Parameters:

delegate (Dict[str, Set[str]]) –

__init__(delegate)
Parameters:

delegate (Dict[str, Set[str]]) –

Return type:

None

update(other)
Parameters:

other (Any) –

Return type:

None

__contains__(value)
Parameters:

value (str) –

Return type:

bool

to_dict()
Return type:

dict

class medcat.utils.memory_optimiser.DelegatingDictEncoder

Bases: medcat.utils.saving.coding.PartEncoder

Base class for protocol classes.

Protocol classes are defined as:

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:

class C:
    def meth(self) -> int:
        return 0

def func(x: Proto) -> int:
    return x.meth()

func(C())  # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:

class GenProto(Protocol[T]):
    def meth(self) -> T:
        ...
try_encode(obj)

Try to encode an object

Parameters:

obj (object) – The object to encode

Raises:

UnsuitableObject – If the object is unsuitable for encoding.

Returns:

Any – The encoded object

class medcat.utils.memory_optimiser.DelegatingDictDecoder

Bases: medcat.utils.saving.coding.PartDecoder

Base class for protocol classes.

Protocol classes are defined as:

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:

class C:
    def meth(self) -> int:
        return 0

def func(x: Proto) -> int:
    return x.meth()

func(C())  # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:

class GenProto(Protocol[T]):
    def meth(self) -> T:
        ...
try_decode(dct)

Try to decode the dictionary.

Parameters:

dct (dict) – The dict to decode.

Returns:

Union[dict, Any] – The dict if unable to decode, the decoded object otherwise

Return type:

Union[dict, medcat.utils.saving.coding.EncodeableObject]

class medcat.utils.memory_optimiser.DelegatingValueSetEncoder

Bases: medcat.utils.saving.coding.PartEncoder

Base class for protocol classes.

Protocol classes are defined as:

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:

class C:
    def meth(self) -> int:
        return 0

def func(x: Proto) -> int:
    return x.meth()

func(C())  # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:

class GenProto(Protocol[T]):
    def meth(self) -> T:
        ...
try_encode(obj)

Try to encode an object

Parameters:

obj (object) – The object to encode

Raises:

UnsuitableObject – If the object is unsuitable for encoding.

Returns:

Any – The encoded object

class medcat.utils.memory_optimiser.DelegatingValueSetDecoder

Bases: medcat.utils.saving.coding.PartDecoder

Base class for protocol classes.

Protocol classes are defined as:

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:

class C:
    def meth(self) -> int:
        return 0

def func(x: Proto) -> int:
    return x.meth()

func(C())  # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:

class GenProto(Protocol[T]):
    def meth(self) -> T:
        ...
try_decode(dct)

Try to decode the dictionary.

Parameters:

dct (dict) – The dict to decode.

Returns:

Union[dict, Any] – The dict if unable to decode, the decoded object otherwise

Return type:

Union[dict, medcat.utils.saving.coding.EncodeableObject]

medcat.utils.memory_optimiser.attempt_fix_after_load(cdb)
Parameters:

cdb (medcat.cdb.CDB) –

medcat.utils.memory_optimiser.attempt_fix_snames_after_load(cdb, snames_attr_name='snames')
Parameters:
medcat.utils.memory_optimiser._optimise(cdb, to_many_name, dict_names_to_combine)
Parameters:
  • cdb (medcat.cdb.CDB) –

  • to_many_name (str) –

  • dict_names_to_combine (List[str]) –

Return type:

None

medcat.utils.memory_optimiser._optimise_snames(cdb, cui2snames='cui2snames', snames_attr='snames')

Optimise the snames part of a CDB.

Parameters:
  • cdb (CDB) – The CDB to optimise snames on.

  • cui2snames (str) – The cui2snames dict name to delegate to. Defaults to ‘cui2snames’.

  • snames_attr (str) – The snames attribute name. Defaults to ‘snames’.

Return type:

None

medcat.utils.memory_optimiser.perform_optimisation(cdb, optimise_cuis=True, optimise_names=False, optimise_snames=True)

Attempts to optimise the memory footprint of the CDB.

This can perform optimisation for cui2<…> and name2<…> dicts. However, by default, only cui2many optimisation will be done. This is because at the time of writing, there were not enough name2<…> dicts to be able to benefit from the optimisation.

Does so by unifying the following dicts:

cui2names (Dict[str, Set[str]]):

From cui to all names assigned to it. Mainly used for subsetting (maybe even only).

cui2snames (Dict[str, Set[str]]):

From cui to all sub-names assigned to it. Only used for subsetting.

cui2context_vectors (Dict[str, Dict[str, np.array]]):

From cui to a dictionary of different kinds of context vectors. Normally you would have here a short and a long context vector - they are calculated separately.

cui2count_train (Dict[str, int]):

From CUI to the number of training examples seen.

cui2tags (Dict[str, List[str]]):

From CUI to a list of tags. This can be used to tag concepts for grouping of whatever.

cui2type_ids (Dict[str, Set[str]]):

From CUI to type id (e.g. TUI in UMLS).

cui2preferred_name (Dict[str, str]):

From CUI to the preferred name for this concept.

cui2average_confidence (Dict[str, str]):

Used for dynamic thresholding. Holds the average confidence for this CUI given the training examples.

name2cuis (Dict[str, List[str]]):

Map fro concept name to CUIs - one name can map to multiple CUIs.

name2cuis2status (Dict[str, Dict[str, str]]):
What is the status for a given name and cui pair - each name can be:

P - Preferred, A - Automatic (e.g. let medcat decide), N - Not common.

name2count_train (Dict[str, str]):

Counts how often did a name appear during training.

It can also delegate the snames set to use the various sets in cui2snames instead.

They will all be included in 1 dict with CUI keys and a list of values for each pre-existing dict.

Parameters:
  • cdb (CDB) – The CDB to modify.

  • optimise_cuis (bool) – Whether to optimise cui2<…> dicts. Defaults to True.

  • optimise_names (bool) – Whether to optimise name2<…> dicts. Defaults to False.

  • optimise_snames (bool) – Whether to optimise snames set. Defaults to True.

Return type:

None

medcat.utils.memory_optimiser._attempt_fix_after_load(cdb, one2many_name, dict_names)
Parameters:
  • cdb (medcat.cdb.CDB) –

  • one2many_name (str) –

  • dict_names (List[str]) –

medcat.utils.memory_optimiser._unoptimise(cdb, to_many_name, dict_names_to_combine)
Parameters:
  • cdb (medcat.cdb.CDB) –

  • to_many_name (str) –

  • dict_names_to_combine (List[str]) –

medcat.utils.memory_optimiser._unoptimise_snames(cdb, cui2snames='cui2snames', snames_attr='snames')
Parameters:
Return type:

None

medcat.utils.memory_optimiser.unoptimise_cdb(cdb)

This undoes all the (potential) memory optimisations done in perform_optimisation.

This method relies on CDB._memory_optimised_parts to be up to date.

Parameters:

cdb (CDB) – The CDB to work on.

medcat.utils.memory_optimiser.map_to_many(dicts)
Parameters:

dicts (List[Dict[str, Any]]) –

Return type:

Tuple[Dict[str, List[Any]], List[DelegatingDict]]