`medcat.utils.regression.targeting`

Module Contents

Classes

`TranslationLayer`	The translation layer for translating:
`TargetPlaceholder`	A class describing the options for a specific placeholder.
`PhraseChanger`	The phrase changer.
`TargetedPhraseChanger`	The target phrase changer.
`FinalTarget`	The final target.
`OptionSet`	The targeting option set.

Attributes

`logger`
`logger`

medcat.utils.regression.targeting.logger

medcat.utils.regression.targeting.logger

class medcat.utils.regression.targeting.TranslationLayer(cui2names, name2cuis, cui2type_ids, cui2children, cui2preferred_names, separator, whitespace=' ')

The translation layer for translating: - CUIs to names - names to CUIs - type_ids to CUIs - CUIs to chil CUIs

The idea is to decouple these translations from the CDB instance in case something changes there.

Parameters:

cui2names (Dict[str, Set[str]]) – The map from CUI to names
name2cuis (Dict[str, List[str]]) – The map from name to CUIs
cui2type_ids (Dict[str, Set[str]]) – The map from CUI to type_ids
cui2children (Dict[str, Set[str]]) – The map from CUI to child CUIs
cui2preferred_names (Dict[str, str]) –
separator (str) –
whitespace (str) –

__init__(cui2names, name2cuis, cui2type_ids, cui2children, cui2preferred_names, separator, whitespace=' ')

Parameters:

cui2names (Dict[str, Set[str]]) –
name2cuis (Dict[str, List[str]]) –
cui2type_ids (Dict[str, Set[str]]) –
cui2children (Dict[str, Set[str]]) –
cui2preferred_names (Dict[str, str]) –
separator (str) –
whitespace (str) –

Return type:

None

get_names_of(cui, only_prefnames)

Get the preprocessed names of a CUI.

This method preporcesses the names by replacing the separator (generally ~) with the appropriate whitespace (` `).

If the concept is not in the underlying CDB, an empty list is returned.

Parameters:

cui (str) – The concept in question.
only_prefnames (bool) – Whether to only return a preferred name.

Returns:

List[str] – The list of names.

Return type:

List[str]

get_preferred_name(cui)

Get the preferred name of a concept.

If no preferred name is found, the random ‘first’ name is selected.

Parameters:: cui (str) – The concept ID.
Returns:: str – The preferred name.
Return type:: str

get_first_name(cui)

Get the preprocessed (potentially) arbitrarily first name of the given concept.

If the concept does not exist, the CUI itself is returned.

PS: The “first” name may not be consistent across runs since it relies on set order.

Parameters:: cui (str) – The concept ID.
Returns:: str – The first name.
Return type:: str

get_direct_children(cui)

Get the direct children of a concept.

This means only the children, but not grandchildren.

If the underlying CDB doesn’t list children for this CUI, an empty list is returned.

Parameters:: cui (str) – The concept in question.
Returns:: List[str] – The (potentially empty) list of direct children.
Return type:: List[str]

get_direct_parents(cui)

Get the direct parent(s) of a concept.

PS: This method can be quite a CPU heavy one since it relies: on running through all the parent-children relationships since the child->parent(s) relationship isn’t normally kept track of.

Parameters:: cui (str) – _description_
Returns:: List[str] – _description_
Return type:: List[str]

get_children_of(found_cuis, cui, depth=1)

Get the children of the specifeid CUI in the listed CUIs (if they exist).

Parameters:

found_cuis (Iterable[str]) – The list of CUIs to look in
cui (str) – The target parent CUI
depth (int) – The depth to carry out the search for

Returns:

List[str] – The list of children found

Return type:

List[str]

classmethod from_CDB(cdb)

Construct a TranslationLayer object from a context database (CDB).

This translation layer will refer to the same dicts that the CDB refers to. While there is no obvious reason these should be modified, it’s something to keep in mind.

Parameters:: cdb (CDB) – The CDB
Returns:: TranslationLayer – The subsequent TranslationLayer
Return type:: TranslationLayer

class medcat.utils.regression.targeting.TargetPlaceholder

Bases: pydantic.BaseModel

A class describing the options for a specific placeholder.

placeholder: str

target_cuis: List[str]

onlyprefnames: bool = False

class medcat.utils.regression.targeting.PhraseChanger

Bases: pydantic.BaseModel

The phrase changer.

This is class used as a preprocessor for phrases with multiple placeholders. It allows swapping in the rest of the placeholders while leaving in the one that’s being tested for.

preprocess_placeholders: List[Tuple[str, str]]

__call__(phrase)

Parameters:: phrase (str) –
Return type:: str

classmethod empty()

Gets the empty phrase changer.

That is a phrase changer that makes no changes to the phrase.

Returns:: PhraseChanger – The empty phrase changer.
Return type:: PhraseChanger

class medcat.utils.regression.targeting.TargetedPhraseChanger

Bases: pydantic.BaseModel

The target phrase changer.

It includes the phrase changer (for preprocessing) along with the relevant concept and the placeholder it will replace.

changer: PhraseChanger

placeholder: str

cui: str

onlyprefnames: bool

class medcat.utils.regression.targeting.FinalTarget

Bases: pydantic.BaseModel

The final target.

This involves the final phrase (which (potentially) has other placeholder replaced in it), the placeholder to be replaced, and the CUI and specific name being used.

placeholder: str

cui: str

name: str

final_phrase: str

class medcat.utils.regression.targeting.OptionSet

Bases: pydantic.BaseModel

The targeting option set.

This describes all the target placeholders and concepts needed.

options: List[TargetPlaceholder]

allow_any_combinations: bool = False

classmethod from_dict(section)

Construct a OptionSet instance from a dict.

The assumed structure is: {

‘placeholders’: [
{ ‘placeholder’: <e.g {DIAGNOSIS}’>, ‘cuis’: <the CUI>, ‘prefname-only’: ‘true’ }, <potentially more>],

‘any-combination’: <True or False>

}

The prefname-only key is optional.

Parameters:

section (Dict[str, Any]) – The dict to parse

Raises:

ProblematicOptionSetException – If incorrect number of CUIs when not allowing any combination
ProblematicOptionSetException – If placeholders not a list
ProblematicOptionSetException – If multiple placehodlers with same place holder

Returns:

OptionSet – The resulting OptionSet

Return type:

OptionSet

to_dict()

Convert the OptionSet to a dict.

Returns:: dict – The dict representation
Return type:: dict

_get_all_combinations(cur_opts, other_opts, translation)

Parameters:

cur_opts (TargetPlaceholder) –
other_opts (List[TargetPlaceholder]) –
translation (TranslationLayer) –

Return type:

Iterator[Tuple[PhraseChanger, str]]

estimate_num_of_subcases()

Get the number of distinct subcases.

This includes ones that can be calculated without the knowledge of the underlying CDB. I.e it doesn’t care for the number of names involved per CUI but only takes into account what is described in the option set itself.

If any combination is allowed, then the answer is the combination of the number of target concepts per option. If any combination is not allowed, then the answer is simply the number of target concepts for an option (they should all have the same number).

Returns:: int – _description_
Return type:: int

get_preprocessors_and_targets(translation)

Get the targeted phrase changers.

Parameters:: translation (TranslationLayer) – The translaton layer.
Yields:: Iterator[TargetedPhraseChanger] – Thetarget phrase changers.
Return type:: Iterator[TargetedPhraseChanger]

exception medcat.utils.regression.targeting.ProblematicOptionSetException(*args)

Bases: ValueError

Inappropriate argument value (of correct type).

Parameters:: args (object) –

__init__(*args)

Initialize self. See help(type(self)) for accurate signature.

Parameters:: args (object) –
Return type:: None

medcat.utils.regression.targeting

Module Contents

Classes

Attributes

`medcat.utils.regression.targeting`