`medcat.utils.regression.checking`

Module Contents

Classes

`RegressionCase`	A regression case that has a name, defines options, filters and phrases.
`MetaData`	The metadata for the regression suite.
`RegressionSuite`	The regression checker.

Functions

`get_ontology_and_version`(model_card)	Attempt to get ontology (and its version) from a model card dict.
`fix_np_float64`(d)	Fix numpy.float64 in dictionary for yaml saving purposes.

Attributes

`logger`
`UNKNOWN_METADATA`

medcat.utils.regression.checking.logger

class medcat.utils.regression.checking.RegressionCase(/, **data)

Bases: pydantic.BaseModel

A regression case that has a name, defines options, filters and phrases.

Parameters:: data (Any) –

name: str

options: medcat.utils.regression.targeting.OptionSet

phrases: List[str]

report: medcat.utils.regression.results.ResultDescriptor

check_specific_for_phrase(cat, target, translation)

Checks whether the specific target along with the specified phrase is able to be identified using the specified model.

Parameters:

cat (CAT) – The model
target (FinalTarget) – The final target configuration
translation (TranslationLayer) – The translation layer

Raises:

MalformedRegressionCaseException – If there are too many placeholders in phrase.

Returns:

Tuple[Finding, Optional[str]] – The nature to which the target was (or wasn’t) identified

Return type:

Tuple[medcat.utils.regression.results.Finding, Optional[str]]

estimate_num_of_diff_subcases()

Return type:: int

get_distinct_cases(translation, edit_distance, use_diacritics)

Gets the various distinct sub-case iterators.

The sub-cases are those that can be determine without the translation layer. However, the translation layer is included here since it streamlines the operation.

Parameters:

translation (TranslationLayer) – The translation layer.
edit_distance (Tuple[int, int, int]) – The edit distance(s) to try.
use_diacritics (bool) – Whether to use diacritics for edit distance.

Yields:

Iterator[Iterator[FinalTarget]] – The iterator of iterators of different sub cases.

Return type:

Iterator[Iterator[medcat.utils.regression.targeting.FinalTarget]]

_get_subcases(phrase, changer, translation, edit_distance, use_diacritics)

Parameters:

phrase (str) –
changer (medcat.utils.regression.targeting.TargetedPhraseChanger) –
translation (medcat.utils.regression.targeting.TranslationLayer) –
edit_distance (Tuple[int, int, int]) –
use_diacritics (bool) –

Return type:

Iterator[medcat.utils.regression.targeting.FinalTarget]

to_dict()

Converts the RegressionCase to a dict for serialisation.

Returns:: dict – The dict representation
Return type:: dict

classmethod from_dict(name, in_dict)

Construct the regression case from a dict.

The expected structure: {

‘targeting’: {

[
‘placeholder’: ‘[DIAGNOSIS]’ # the placeholder to be replaced ‘cuis’: [‘cui1’, ‘cui2’] ‘prefname-only’: ‘false’, # optional

]

}, ‘phrases’: [‘phrase %s’] # possible multiple

}

Parameters:

name (str) – The name of the case
in_dict (dict) – The dict describing the case

Raises:

ValueError – If the input dict does not have the ‘targeting’ section
ValueError – If there are no phrases defined

Returns:

RegressionCase – The constructed regression cases.

Return type:

RegressionCase

__hash__()

Return hash(self).

Return type:: int

__eq__(other)

Return self==value.

Parameters:: other (Any) –
Return type:: bool

medcat.utils.regression.checking.UNKNOWN_METADATA = 'Unknown'

medcat.utils.regression.checking.get_ontology_and_version(model_card)

Attempt to get ontology (and its version) from a model card dict.

If no ontology is found, ‘Unknown’ is returned. The version is always returned as the first source ontology. That is, unless the specified location does not exist in the model card, in which case ‘Unknown’ is returned.

The ontology is assumed to be described at:: model_card[‘Source Ontology’][0] (or model_card[‘Source Ontology’] if it’s a string instead of a list)
The ontology version is read from:: model_card[‘Source Ontology’][0] (or model_card[‘Source Ontology’] if it’s a string instead of a list)

Currently, only SNOMED-CT, UMLS and ICD are supported / found.

Parameters:: model_card (dict) – The input model card.
Returns:: Tuple[str, str] – The ontology (if found) or ‘Unknown’; and the version (if found) or ‘Unknown’
Return type:: Tuple[str, str]

class medcat.utils.regression.checking.MetaData(/, **data)

Bases: pydantic.BaseModel

The metadata for the regression suite.

This should define which ontology (e.g UMLS or SNOMED) as well as which version was used when generating the regression suite.

The metadata may contain further information as well, this may include the annotator(s) involved when converting from MCT export or other relevant data.

Parameters:: data (Any) –

ontology: str

ontology_version: str

extra: dict

regr_suite_creation_date: str

classmethod from_modelcard(model_card)

Generate a MetaData object from a model card.

This involves reading ontology info and version from the model card.

It must be noted that the model card should be provided as a dict not a string.

Parameters:: model_card (dict) – The CAT modelcard
Returns:: MetaData – The resulting MetaData
Return type:: MetaData

classmethod unknown()

Return type:: MetaData

medcat.utils.regression.checking.fix_np_float64(d)

Fix numpy.float64 in dictionary for yaml saving purposes.

These types of objects are unable to be cleanly serialized using yaml. So we need to convert them to the corresponding floats.

The changes will be made within the dictionary itself as well as dictionaries within, recursively.

Parameters:: d (dict) – The input dict
Return type:: None

class medcat.utils.regression.checking.RegressionSuite(cases, metadata, name)

The regression checker. This is used to check a bunch of regression cases at once against a model.

Parameters:

cases (List[RegressionCase]) – The list of regression cases
metadata (MetaData) – The metadata for the regression suite
use_report (bool) – Whether or not to use the report functionality (defaults to False)
name (str) –

__init__(cases, metadata, name)

Parameters:

cases (List[RegressionCase]) –
metadata (MetaData) –
name (str) –

Return type:

None

get_all_distinct_cases(translation, edit_distance, use_diacritics)

Gets all the distinct cases for this regression suite.

While distinct cases can be determined without the translation layer, including it here simplifies the process.

Parameters:

translation (TranslationLayer) – The translation layer.
edit_distance (Tuple[int, int, int]) – The edit distance(s) to try. Defaults to (0, 0, 0).
use_diacritics (bool) – Whether to use diacritics for edit distance.

Yields:

Iterator[Tuple[RegressionCase, Iterator[FinalTarget]]] –

The generator of the: regression case along with its corresponding sub-cases.

Return type:

Iterator[Tuple[RegressionCase, Iterator[medcat.utils.regression.targeting.FinalTarget]]]

estimate_total_distinct_cases()

Return type:: int

iter_subcases(translation, show_progress=True, edit_distance=(0, 0, 0), use_diacritics=False)

Iterate over all the sub-cases.

Each sub-case present a unique target (phrase, concept, name) on the corresponding regression case.

Parameters:

translation (TranslationLayer) – The translation layer.
show_progress (bool) – Whether to show progress. Defaults to True.
edit_distance (Tuple[int, int, int]) – The edit distance(s) to try. Defaults to (0, 0, 0).
use_diacritics (bool) – Whether to use diacritics for edit distance.

Yields:

Iterator[Tuple[RegressionCase, FinalTarget]] –

The generator of the: regression case along with each of the final target sub-cases.

Return type:

Iterator[Tuple[RegressionCase, medcat.utils.regression.targeting.FinalTarget]]

check_model(cat, translation, edit_distance=(0, 0, 0), use_diacritics=False)

Checks model and generates a report

Parameters:

cat (CAT) – The model to check against
translation (TranslationLayer) – The translation layer
edit_distance (Tuple[int, int, int]) – The edit distance of the names. Defaults to (0, 0, 0).
use_diacritics (bool) – Whether to use diacritics for edit distance.

Returns:

MultiDescriptor – A report description

Return type:

medcat.utils.regression.results.MultiDescriptor

__str__()

Return str(self).

Return type:: str

__repr__()

Return repr(self).

Return type:: str

to_dict()

Converts the RegressionChecker to dict for serialisation.

Returns:: dict – The dict representation
Return type:: dict

to_yaml()

Convert the RegressionChecker to YAML string.

Returns:: str – The YAML representation
Return type:: str

__eq__(other)

Return self==value.

Parameters:: other (object) –
Return type:: bool

classmethod from_dict(in_dict, name)

Construct a RegressionChecker from a dict.

Most of the parsing is handled in RegressionChecker.from_dict. This just assumes that each key in the dict is a name and each value describes a RegressionCase.

Parameters:

in_dict (dict) – The input dict.
name (str) – The name of the regression suite.

Returns:

RegressionChecker – The built regression checker

Return type:

RegressionSuite

classmethod from_yaml(file_name)

Constructs a RegressionChcker from a YAML file.

The from_dict method is used for the construction from the dict.

Parameters:: file_name (str) – The file name
Returns:: RegressionChecker – The constructed regression checker
Return type:: RegressionSuite

classmethod from_mct_export(file_name)

Parameters:: file_name (str) –
Return type:: RegressionSuite

exception medcat.utils.regression.checking.MalformedRegressionCaseException(*args)

Bases: ValueError

Inappropriate argument value (of correct type).

Parameters:: args (object) –

__init__(*args)

Initialize self. See help(type(self)) for accurate signature.

Parameters:: args (object) –
Return type:: None

medcat.utils.regression.checking

Module Contents

Classes

Functions

Attributes

`medcat.utils.regression.checking`