medcat.utils.regression.targeting
Module Contents
Classes
The translation layer for translating: |
|
A class describing the options for a specific placeholder. |
|
The phrase changer. |
|
The target phrase changer. |
|
The final target. |
|
The targeting option set. |
Attributes
- medcat.utils.regression.targeting.logger
- medcat.utils.regression.targeting.logger
- class medcat.utils.regression.targeting.TranslationLayer(cui2names, name2cuis, cui2type_ids, cui2children, cui2preferred_names, separator, whitespace=' ')
The translation layer for translating: - CUIs to names - names to CUIs - type_ids to CUIs - CUIs to chil CUIs
The idea is to decouple these translations from the CDB instance in case something changes there.
- Parameters:
cui2names (Dict[str, Set[str]]) – The map from CUI to names
name2cuis (Dict[str, List[str]]) – The map from name to CUIs
cui2type_ids (Dict[str, Set[str]]) – The map from CUI to type_ids
cui2children (Dict[str, Set[str]]) – The map from CUI to child CUIs
cui2preferred_names (Dict[str, str]) –
separator (str) –
whitespace (str) –
- __init__(cui2names, name2cuis, cui2type_ids, cui2children, cui2preferred_names, separator, whitespace=' ')
- Parameters:
cui2names (Dict[str, Set[str]]) –
name2cuis (Dict[str, List[str]]) –
cui2type_ids (Dict[str, Set[str]]) –
cui2children (Dict[str, Set[str]]) –
cui2preferred_names (Dict[str, str]) –
separator (str) –
whitespace (str) –
- Return type:
None
- get_names_of(cui, only_prefnames)
Get the preprocessed names of a CUI.
This method preporcesses the names by replacing the separator (generally ~) with the appropriate whitespace (` `).
If the concept is not in the underlying CDB, an empty list is returned.
- Parameters:
cui (str) – The concept in question.
only_prefnames (bool) – Whether to only return a preferred name.
- Returns:
List[str] – The list of names.
- Return type:
List[str]
- get_preferred_name(cui)
Get the preferred name of a concept.
If no preferred name is found, the random ‘first’ name is selected.
- Parameters:
cui (str) – The concept ID.
- Returns:
str – The preferred name.
- Return type:
str
- get_first_name(cui)
Get the preprocessed (potentially) arbitrarily first name of the given concept.
If the concept does not exist, the CUI itself is returned.
PS: The “first” name may not be consistent across runs since it relies on set order.
- Parameters:
cui (str) – The concept ID.
- Returns:
str – The first name.
- Return type:
str
- get_direct_children(cui)
Get the direct children of a concept.
This means only the children, but not grandchildren.
If the underlying CDB doesn’t list children for this CUI, an empty list is returned.
- Parameters:
cui (str) – The concept in question.
- Returns:
List[str] – The (potentially empty) list of direct children.
- Return type:
List[str]
- get_direct_parents(cui)
Get the direct parent(s) of a concept.
- PS: This method can be quite a CPU heavy one since it relies
on running through all the parent-children relationships since the child->parent(s) relationship isn’t normally kept track of.
- Parameters:
cui (str) – _description_
- Returns:
List[str] – _description_
- Return type:
List[str]
- get_children_of(found_cuis, cui, depth=1)
Get the children of the specifeid CUI in the listed CUIs (if they exist).
- Parameters:
found_cuis (Iterable[str]) – The list of CUIs to look in
cui (str) – The target parent CUI
depth (int) – The depth to carry out the search for
- Returns:
List[str] – The list of children found
- Return type:
List[str]
- classmethod from_CDB(cdb)
Construct a TranslationLayer object from a context database (CDB).
This translation layer will refer to the same dicts that the CDB refers to. While there is no obvious reason these should be modified, it’s something to keep in mind.
- Parameters:
cdb (CDB) – The CDB
- Returns:
TranslationLayer – The subsequent TranslationLayer
- Return type:
- class medcat.utils.regression.targeting.TargetPlaceholder(/, **data)
Bases:
pydantic.BaseModelA class describing the options for a specific placeholder.
- Parameters:
data (Any) –
- placeholder: str
- target_cuis: List[str]
- onlyprefnames: bool = False
- class medcat.utils.regression.targeting.PhraseChanger(/, **data)
Bases:
pydantic.BaseModelThe phrase changer.
This is class used as a preprocessor for phrases with multiple placeholders. It allows swapping in the rest of the placeholders while leaving in the one that’s being tested for.
- Parameters:
data (Any) –
- preprocess_placeholders: List[Tuple[str, str]]
- __call__(phrase)
- Parameters:
phrase (str) –
- Return type:
str
- classmethod empty()
Gets the empty phrase changer.
That is a phrase changer that makes no changes to the phrase.
- Returns:
PhraseChanger – The empty phrase changer.
- Return type:
- class medcat.utils.regression.targeting.TargetedPhraseChanger(/, **data)
Bases:
pydantic.BaseModelThe target phrase changer.
It includes the phrase changer (for preprocessing) along with the relevant concept and the placeholder it will replace.
- Parameters:
data (Any) –
- changer: PhraseChanger
- placeholder: str
- cui: str
- onlyprefnames: bool
- class medcat.utils.regression.targeting.FinalTarget(/, **data)
Bases:
pydantic.BaseModelThe final target.
This involves the final phrase (which (potentially) has other placeholder replaced in it), the placeholder to be replaced, and the CUI and specific name being used.
- Parameters:
data (Any) –
- placeholder: str
- cui: str
- name: str
- final_phrase: str
- class medcat.utils.regression.targeting.OptionSet(/, **data)
Bases:
pydantic.BaseModelThe targeting option set.
This describes all the target placeholders and concepts needed.
- Parameters:
data (Any) –
- options: List[TargetPlaceholder]
- allow_any_combinations: bool = False
- classmethod from_dict(section)
Construct a OptionSet instance from a dict.
The assumed structure is: {
- ‘placeholders’: [
{ ‘placeholder’: <e.g {DIAGNOSIS}’>, ‘cuis’: <the CUI>, ‘prefname-only’: ‘true’ }, <potentially more>],
‘any-combination’: <True or False>
}
The prefname-only key is optional.
- Parameters:
section (Dict[str, Any]) – The dict to parse
- Raises:
ProblematicOptionSetException – If incorrect number of CUIs when not allowing any combination
ProblematicOptionSetException – If placeholders not a list
ProblematicOptionSetException – If multiple placehodlers with same place holder
- Returns:
OptionSet – The resulting OptionSet
- Return type:
- to_dict()
Convert the OptionSet to a dict.
- Returns:
dict – The dict representation
- Return type:
dict
- _get_all_combinations(cur_opts, other_opts, translation)
- Parameters:
cur_opts (TargetPlaceholder) –
other_opts (List[TargetPlaceholder]) –
translation (TranslationLayer) –
- Return type:
Iterator[Tuple[PhraseChanger, str]]
- estimate_num_of_subcases()
Get the number of distinct subcases.
This includes ones that can be calculated without the knowledge of the underlying CDB. I.e it doesn’t care for the number of names involved per CUI but only takes into account what is described in the option set itself.
If any combination is allowed, then the answer is the combination of the number of target concepts per option. If any combination is not allowed, then the answer is simply the number of target concepts for an option (they should all have the same number).
- Returns:
int – _description_
- Return type:
int
- get_preprocessors_and_targets(translation)
Get the targeted phrase changers.
- Parameters:
translation (TranslationLayer) – The translaton layer.
- Yields:
Iterator[TargetedPhraseChanger] – Thetarget phrase changers.
- Return type:
Iterator[TargetedPhraseChanger]
- exception medcat.utils.regression.targeting.ProblematicOptionSetException(*args)
Bases:
ValueErrorInappropriate argument value (of correct type).
- Parameters:
args (object) –
- __init__(*args)
Initialize self. See help(type(self)) for accurate signature.
- Parameters:
args (object) –
- Return type:
None