medcat.utils.regression.targeting

Module Contents

Classes

TranslationLayer

The translation layer for translating:

TargetPlaceholder

A class describing the options for a specific placeholder.

PhraseChanger

The phrase changer.

TargetedPhraseChanger

The target phrase changer.

FinalTarget

The final target.

OptionSet

The targeting option set.

Attributes

logger

logger

medcat.utils.regression.targeting.logger
medcat.utils.regression.targeting.logger
class medcat.utils.regression.targeting.TranslationLayer(cui2names, name2cuis, cui2type_ids, cui2children, cui2preferred_names, separator, whitespace=' ')

The translation layer for translating: - CUIs to names - names to CUIs - type_ids to CUIs - CUIs to chil CUIs

The idea is to decouple these translations from the CDB instance in case something changes there.

Parameters:
  • cui2names (Dict[str, Set[str]]) – The map from CUI to names

  • name2cuis (Dict[str, List[str]]) – The map from name to CUIs

  • cui2type_ids (Dict[str, Set[str]]) – The map from CUI to type_ids

  • cui2children (Dict[str, Set[str]]) – The map from CUI to child CUIs

  • cui2preferred_names (Dict[str, str]) –

  • separator (str) –

  • whitespace (str) –

__init__(cui2names, name2cuis, cui2type_ids, cui2children, cui2preferred_names, separator, whitespace=' ')
Parameters:
  • cui2names (Dict[str, Set[str]]) –

  • name2cuis (Dict[str, List[str]]) –

  • cui2type_ids (Dict[str, Set[str]]) –

  • cui2children (Dict[str, Set[str]]) –

  • cui2preferred_names (Dict[str, str]) –

  • separator (str) –

  • whitespace (str) –

Return type:

None

get_names_of(cui, only_prefnames)

Get the preprocessed names of a CUI.

This method preporcesses the names by replacing the separator (generally ~) with the appropriate whitespace (` `).

If the concept is not in the underlying CDB, an empty list is returned.

Parameters:
  • cui (str) – The concept in question.

  • only_prefnames (bool) – Whether to only return a preferred name.

Returns:

List[str] – The list of names.

Return type:

List[str]

get_preferred_name(cui)

Get the preferred name of a concept.

If no preferred name is found, the random ‘first’ name is selected.

Parameters:

cui (str) – The concept ID.

Returns:

str – The preferred name.

Return type:

str

get_first_name(cui)

Get the preprocessed (potentially) arbitrarily first name of the given concept.

If the concept does not exist, the CUI itself is returned.

PS: The “first” name may not be consistent across runs since it relies on set order.

Parameters:

cui (str) – The concept ID.

Returns:

str – The first name.

Return type:

str

get_direct_children(cui)

Get the direct children of a concept.

This means only the children, but not grandchildren.

If the underlying CDB doesn’t list children for this CUI, an empty list is returned.

Parameters:

cui (str) – The concept in question.

Returns:

List[str] – The (potentially empty) list of direct children.

Return type:

List[str]

get_direct_parents(cui)

Get the direct parent(s) of a concept.

PS: This method can be quite a CPU heavy one since it relies

on running through all the parent-children relationships since the child->parent(s) relationship isn’t normally kept track of.

Parameters:

cui (str) – _description_

Returns:

List[str] – _description_

Return type:

List[str]

get_children_of(found_cuis, cui, depth=1)

Get the children of the specifeid CUI in the listed CUIs (if they exist).

Parameters:
  • found_cuis (Iterable[str]) – The list of CUIs to look in

  • cui (str) – The target parent CUI

  • depth (int) – The depth to carry out the search for

Returns:

List[str] – The list of children found

Return type:

List[str]

classmethod from_CDB(cdb)

Construct a TranslationLayer object from a context database (CDB).

This translation layer will refer to the same dicts that the CDB refers to. While there is no obvious reason these should be modified, it’s something to keep in mind.

Parameters:

cdb (CDB) – The CDB

Returns:

TranslationLayer – The subsequent TranslationLayer

Return type:

TranslationLayer

class medcat.utils.regression.targeting.TargetPlaceholder

Bases: pydantic.BaseModel

A class describing the options for a specific placeholder.

placeholder: str
target_cuis: List[str]
onlyprefnames: bool = False
class medcat.utils.regression.targeting.PhraseChanger

Bases: pydantic.BaseModel

The phrase changer.

This is class used as a preprocessor for phrases with multiple placeholders. It allows swapping in the rest of the placeholders while leaving in the one that’s being tested for.

preprocess_placeholders: List[Tuple[str, str]]
__call__(phrase)
Parameters:

phrase (str) –

Return type:

str

classmethod empty()

Gets the empty phrase changer.

That is a phrase changer that makes no changes to the phrase.

Returns:

PhraseChanger – The empty phrase changer.

Return type:

PhraseChanger

class medcat.utils.regression.targeting.TargetedPhraseChanger

Bases: pydantic.BaseModel

The target phrase changer.

It includes the phrase changer (for preprocessing) along with the relevant concept and the placeholder it will replace.

changer: PhraseChanger
placeholder: str
cui: str
onlyprefnames: bool
class medcat.utils.regression.targeting.FinalTarget

Bases: pydantic.BaseModel

The final target.

This involves the final phrase (which (potentially) has other placeholder replaced in it), the placeholder to be replaced, and the CUI and specific name being used.

placeholder: str
cui: str
name: str
final_phrase: str
class medcat.utils.regression.targeting.OptionSet

Bases: pydantic.BaseModel

The targeting option set.

This describes all the target placeholders and concepts needed.

options: List[TargetPlaceholder]
allow_any_combinations: bool = False
classmethod from_dict(section)

Construct a OptionSet instance from a dict.

The assumed structure is: {

‘placeholders’: [

{ ‘placeholder’: <e.g {DIAGNOSIS}’>, ‘cuis’: <the CUI>, ‘prefname-only’: ‘true’ }, <potentially more>],

‘any-combination’: <True or False>

}

The prefname-only key is optional.

Parameters:

section (Dict[str, Any]) – The dict to parse

Raises:
Returns:

OptionSet – The resulting OptionSet

Return type:

OptionSet

to_dict()

Convert the OptionSet to a dict.

Returns:

dict – The dict representation

Return type:

dict

_get_all_combinations(cur_opts, other_opts, translation)
Parameters:
Return type:

Iterator[Tuple[PhraseChanger, str]]

estimate_num_of_subcases()

Get the number of distinct subcases.

This includes ones that can be calculated without the knowledge of the underlying CDB. I.e it doesn’t care for the number of names involved per CUI but only takes into account what is described in the option set itself.

If any combination is allowed, then the answer is the combination of the number of target concepts per option. If any combination is not allowed, then the answer is simply the number of target concepts for an option (they should all have the same number).

Returns:

int – _description_

Return type:

int

get_preprocessors_and_targets(translation)

Get the targeted phrase changers.

Parameters:

translation (TranslationLayer) – The translaton layer.

Yields:

Iterator[TargetedPhraseChanger] – Thetarget phrase changers.

Return type:

Iterator[TargetedPhraseChanger]

exception medcat.utils.regression.targeting.ProblematicOptionSetException(*args)

Bases: ValueError

Inappropriate argument value (of correct type).

Parameters:

args (object) –

__init__(*args)

Initialize self. See help(type(self)) for accurate signature.

Parameters:

args (object) –

Return type:

None