medcat.utils.regression.converting

Module Contents

Classes

ContextSelector

Describes how the context of a concept is found.

PerWordContextSelector

Context selector that selects a number of words

PerSentenceSelector

Context selector that selects a sentence as context.

UniqueNamePreserver

Used to preserver unique names in a set

Functions

get_matching_case(cases, filters)

Get a case that matches a set of filters (if one exists) from within a list.

medcat_export_json_to_regression_yml(mct_export_file)

Extract regression test cases from a MedCATtrainer export yaml.

Attributes

logger

medcat.utils.regression.converting.logger
class medcat.utils.regression.converting.ContextSelector

Bases: abc.ABC

Describes how the context of a concept is found. A sub-class should be used as this one has no implementation.

_splitter(text)
Parameters:

text (str) –

Return type:

List[str]

make_replace_safe(text)

Make the text replace-safe. That is, wrap all ‘%’ as ‘%%’ so that the text % replacement syntax can be used for an inserted part (and that part only).

Parameters:

text (str) – The text to use

Returns:

str – The replace-safe text

Return type:

str

abstract get_context(text, start, end, leave_concept=False)

Get the context of a concept within a larger body of text. The concept is specifiedb by its start and end indices.

Parameters:
  • text (str) – The larger text

  • start (int) – The starting index

  • end (int) – The ending index

  • leave_concept (bool) – Whether to leave the concept or replace it by ‘%s’. Defaults to False

Returns:

str – The select contexts

Return type:

str

class medcat.utils.regression.converting.PerWordContextSelector(words_before, words_after)

Bases: ContextSelector

Context selector that selects a number of words from either side of the concept, regardless of punctuation.

Parameters:
  • words_before (int) – Number of words to select from before concept

  • words_after (int) – Number of words to select from after concepts

__init__(words_before, words_after)
Parameters:
  • words_before (int) –

  • words_after (int) –

Return type:

None

get_context(text, start, end, leave_concept=False)

Get the context of a concept within a larger body of text. The concept is specifiedb by its start and end indices.

Parameters:
  • text (str) – The larger text

  • start (int) – The starting index

  • end (int) – The ending index

  • leave_concept (bool) – Whether to leave the concept or replace it by ‘%s’. Defaults to False

Returns:

str – The select contexts

Return type:

str

class medcat.utils.regression.converting.PerSentenceSelector

Bases: ContextSelector

Context selector that selects a sentence as context. Sentences are said to end with either “.”, “?” or “!”.

stoppers = '\\.+|\\?+|!+'
get_context(text, start, end, leave_concept=False)

Get the context of a concept within a larger body of text. The concept is specifiedb by its start and end indices.

Parameters:
  • text (str) – The larger text

  • start (int) – The starting index

  • end (int) – The ending index

  • leave_concept (bool) – Whether to leave the concept or replace it by ‘%s’. Defaults to False

Returns:

str – The select contexts

Return type:

str

class medcat.utils.regression.converting.UniqueNamePreserver

Used to preserver unique names in a set

__init__()
Return type:

None

name2nrgen(name, nr)

The method to generate name and copy-number combinations.

Parameters:
  • name (str) – The base name

  • nr (int) – The number of the copy

Returns:

str – The combined name

Return type:

str

get_unique_name(orig_name, dupe_nr=0)

Get the unique name of dupe number (at least) as high as specified.

Parameters:
  • orig_name (str) – The original / base name

  • dupe_nr (int) – The number of the copy to start from. Defaults to 0.

Returns:

str – The unique name

Return type:

str

medcat.utils.regression.converting.get_matching_case(cases, filters)

Get a case that matches a set of filters (if one exists) from within a list.

Parameters:
Returns:

Optional[RegressionCase] – The regression case (if found) or None

Return type:

Optional[medcat.utils.regression.checking.RegressionCase]

medcat.utils.regression.converting.medcat_export_json_to_regression_yml(mct_export_file, cont_sel=PerSentenceSelector(), model_card=None)

Extract regression test cases from a MedCATtrainer export yaml. This is done based on the context selector specified.

Parameters:
  • mct_export_file (str) – The MCT export file path

  • cont_sel (ContextSelector) – The context selector. Defaults to PerSentenceSelector().

  • model_card (Optional[dict]) – The optional model card for generating metadata

Returns:

str – Extracted regression cases in YAML form

Return type:

str