medcat.utils.regression.checking

Module Contents

Classes

RegressionCase

A regression case that has a name, defines options, filters and phrases.s

MetaData

The metadat for the regression suite.

RegressionChecker

The regression checker.

Functions

get_ontology_and_version(model_card)

Attempt to get ontology (and its version) from a model card dict.

fix_np_float64(d)

Fix numpy.float64 in dictrionary for yaml saving purposes.

Attributes

logger

UNKNOWN_METADATA

medcat.utils.regression.checking.logger
class medcat.utils.regression.checking.RegressionCase

Bases: pydantic.BaseModel

A regression case that has a name, defines options, filters and phrases.s

name: str
options: medcat.utils.regression.targeting.FilterOptions
filters: List[medcat.utils.regression.targeting.TypedFilter]
phrases: List[str]
report: medcat.utils.regression.results.ResultDescriptor
get_all_targets(in_set, translation)

Get all applicable targets for this regression case

Parameters:
  • in_set (Iterator[Tuple[str, str]]) – The input generator / iterator

  • translation (TranslationLayer) – The translation layer

Yields:

Iterator[Tuple[str, str]] – The output generator

Return type:

Iterator[Tuple[str, str]]

check_specific_for_phrase(cat, cui, name, phrase, translation)

Checks whether the specific target along with the specified phrase is able to be identified using the specified model.

Parameters:
  • cat (CAT) – The model

  • cui (str) – The target CUI

  • name (str) – The target name

  • phrase (str) – The phrase to check

  • translation (TranslationLayer) – The translation layer

Returns:

bool – Whether or not the target was correctly identified

Return type:

bool

_get_all_cuis_names_types()
Return type:

Tuple[Set[str], Set[str], Set[str]]

get_all_subcases(translation)

Get all subcases for this case. That is, all combinations of targets with their appropriate phrases.

Parameters:

translation (TranslationLayer) – The translation layer

Yields:

Iterator[Tuple[str, str, str]] – The generator for the target info and the phrase

Return type:

Iterator[Tuple[str, str, str]]

_get_specific_cui_and_name()
Return type:

Iterator[Tuple[str, str]]

check_case(cat, translation)

Check the regression case against a model. I.e check all its applicable targets.

Parameters:
Returns:

Tuple[int, int] – Number of successes and number of failures

Return type:

Tuple[int, int]

to_dict()

Converts the RegressionCase to a dict for serialisation.

Returns:

dict – The dict representation

Return type:

dict

classmethod from_dict(name, in_dict)

Construct the regression case from a dict.

The expected stucture: {

‘targeting’: {

‘strategy’: ‘ALL’, # optional ‘prefname-only’: ‘false’, # optional ‘filters’: {

<filter type>: <filter values>, # possibly multiple

}

}, ‘phrases’: [‘phrase %s’] # possible multiple

}

Parsing the different parts of are delegated to other methods within the relevant classes. Delegators include: FilterOptions, TypedFilter

Parameters:
  • name (str) – The name of the case

  • in_dict (dict) – The dict describing the case

Raises:
  • ValueError – If the input dict does not have the ‘targeting’ section

  • ValueError – If the ‘targeting’ section does not have a ‘filters’ section

  • ValueError – If there are no phrases defined

Returns:

RegressionCase – The constructed regression case

Return type:

RegressionCase

__hash__()
Return type:

int

__eq__(other)
Parameters:

other (Any) –

Return type:

bool

medcat.utils.regression.checking.UNKNOWN_METADATA = 'Unknown'
medcat.utils.regression.checking.get_ontology_and_version(model_card)

Attempt to get ontology (and its version) from a model card dict.

If no ontology is found, ‘Unknown’ is returned. The version is always returned as the first source ontology. That is, unless the specified location does not exist in the model card, in which case ‘Unknown’ is returned.

The ontology is assumed to be descibed at:

model_card[‘Source Ontology’][0] (or model_card[‘Source Ontology’] if it’s a string instead of a list)

The ontology version is read from:

model_card[‘Source Ontology’][0] (or model_card[‘Source Ontology’] if it’s a string instead of a list)

Currently, only SNOMED-CT, UMLS and ICD are supported / found.

Parameters:

model_card (dict) – The input model card.

Returns:

Tuple[str, str] – The ontology (if found) or ‘Unknown’; and the version (if found) or ‘Unknown’

Return type:

Tuple[str, str]

class medcat.utils.regression.checking.MetaData

Bases: pydantic.BaseModel

The metadat for the regression suite.

This should define which ontology (e.g UMLS or SNOMED) as well as which version was used when generating the regression suite.

The metadata may contain further information as well, this may include the annotator(s) involved when converting from MCT export or other relevant data.

ontology: str
ontology_version: str
extra: dict
regr_suite_creation_date: str
classmethod from_modelcard(model_card)

Generate a MetaData object from a model card.

This involves reading ontology info and version from the model card.

It must be noted that the model card should be provided as a dict not a string.

Parameters:

model_card (dict) – The CAT modelcard

Returns:

MetaData – The resulting MetaData

Return type:

MetaData

classmethod unknown()
Return type:

MetaData

medcat.utils.regression.checking.fix_np_float64(d)

Fix numpy.float64 in dictrionary for yaml saving purposes.

These types of objects are unable to be cleanly serialized using yaml. So we need to conver them to the corresponding floats.

The changes will be made within the dictionary itself as well as dictionaries within, recursively.

Parameters:

d (dict) – The input dict

Return type:

None

class medcat.utils.regression.checking.RegressionChecker(cases, metadata)

The regression checker. This is used to check a bunch of regression cases at once against a model.

Parameters:
  • cases (List[RegressionCase]) – The list of regression cases

  • metadata (MetaData) – The metadata for the regression suite

  • use_report (bool) – Whether or not to use the report functionality (defaults to False)

__init__(cases, metadata)
Parameters:
Return type:

None

get_all_subcases(translation)

Get all subcases (i.e regssion case, target info and phrase) for this checker.

Parameters:

translation (TranslationLayer) – The translation layer

Yields:

Iterator[Tuple[RegressionCase, str, str, str]] – The generator for all the cases

Return type:

Iterator[Tuple[RegressionCase, str, str, str]]

check_model(cat, translation, total=None)

Checks model and generates a report

Parameters:
  • cat (CAT) – The model to check against

  • translation (TranslationLayer) – The translation layer

  • total (Optional[int]) – The total number of (sub)cases expected (for a progress bar)

Returns:

MultiDescriptor – A report description

Return type:

medcat.utils.regression.results.MultiDescriptor

__str__()

Return str(self).

Return type:

str

__repr__()

Return repr(self).

Return type:

str

to_dict()

Converts the RegressionChecker to dict for serialisation.

Returns:

dict – The dict representation

Return type:

dict

to_yaml()

Convert the RegressionChecker to YAML string.

Returns:

str – The YAML representation

Return type:

str

__eq__(other)

Return self==value.

Parameters:

other (object) –

Return type:

bool

classmethod from_dict(in_dict)

Construct a RegressionChecker from a dict.

Most of the parsing is handled in RegressionChecker.from_dict. This just assumes that each key in the dict is a name and each value describes a RegressionCase.

Parameters:

in_dict (dict) – The input dict

Returns:

RegressionChecker – The built regression checker

Return type:

RegressionChecker

classmethod from_yaml(file_name)

Constructs a RegressionChcker from a YAML file.

The from_dict method is used for the construction from the dict.

Parameters:

file_name (str) – The file name

Returns:

RegressionChecker – The constructed regression checker

Return type:

RegressionChecker