`medcat.utils.regression.converting`

Module Contents

Classes

`ContextSelector`	Describes how the context of a concept is found.
`PerWordContextSelector`	Context selector that selects a number of words
`PerSentenceSelector`	Context selector that selects a sentence as context.
`UniqueNamePreserver`	Used to preserver unique names in a set

Functions

`get_matching_case`(cases, filters)	Get a case that matches a set of filters (if one exists) from within a list.
`medcat_export_json_to_regression_yml`(mct_export_file)	Extract regression test cases from a MedCATtrainer export yaml.

Attributes

logger

medcat.utils.regression.converting.logger

class medcat.utils.regression.converting.ContextSelector

Bases: abc.ABC

Describes how the context of a concept is found. A sub-class should be used as this one has no implementation.

_splitter(text)

Parameters:: text (str) –
Return type:: List[str]

make_replace_safe(text)

Make the text replace-safe. That is, wrap all ‘%’ as ‘%%’ so that the text % replacement syntax can be used for an inserted part (and that part only).

Parameters:: text (str) – The text to use
Returns:: str – The replace-safe text
Return type:: str

abstract get_context(text, start, end, leave_concept=False)

Get the context of a concept within a larger body of text. The concept is specifiedb by its start and end indices.

Parameters:

text (str) – The larger text
start (int) – The starting index
end (int) – The ending index
leave_concept (bool) – Whether to leave the concept or replace it by ‘%s’. Defaults to False

Returns:

str – The select contexts

Return type:

str

class medcat.utils.regression.converting.PerWordContextSelector(words_before, words_after)

Bases: ContextSelector

Context selector that selects a number of words from either side of the concept, regardless of punctuation.

Parameters:

words_before (int) – Number of words to select from before concept
words_after (int) – Number of words to select from after concepts

__init__(words_before, words_after)

Parameters:

words_before (int) –
words_after (int) –

Return type:

None

get_context(text, start, end, leave_concept=False)

Get the context of a concept within a larger body of text. The concept is specifiedb by its start and end indices.

Parameters:

text (str) – The larger text
start (int) – The starting index
end (int) – The ending index
leave_concept (bool) – Whether to leave the concept or replace it by ‘%s’. Defaults to False

Returns:

str – The select contexts

Return type:

str

class medcat.utils.regression.converting.PerSentenceSelector

Bases: ContextSelector

Context selector that selects a sentence as context. Sentences are said to end with either “.”, “?” or “!”.

stoppers = '\\.+|\\?+|!+'

get_context(text, start, end, leave_concept=False)

Get the context of a concept within a larger body of text. The concept is specifiedb by its start and end indices.

Parameters:

text (str) – The larger text
start (int) – The starting index
end (int) – The ending index
leave_concept (bool) – Whether to leave the concept or replace it by ‘%s’. Defaults to False

Returns:

str – The select contexts

Return type:

str

class medcat.utils.regression.converting.UniqueNamePreserver

Used to preserver unique names in a set

__init__()

Return type:: None

name2nrgen(name, nr)

The method to generate name and copy-number combinations.

Parameters:

name (str) – The base name
nr (int) – The number of the copy

Returns:

str – The combined name

Return type:

str

get_unique_name(orig_name, dupe_nr=0)

Get the unique name of dupe number (at least) as high as specified.

Parameters:

orig_name (str) – The original / base name
dupe_nr (int) – The number of the copy to start from. Defaults to 0.

Returns:

str – The unique name

Return type:

str

medcat.utils.regression.converting.get_matching_case(cases, filters)

Get a case that matches a set of filters (if one exists) from within a list.

Parameters:

cases (List[RegressionCase]) – The list to look in
filters (List[TypedFilter]) – The filters to compare to

Returns:

Optional[RegressionCase] – The regression case (if found) or None

Return type:

Optional[medcat.utils.regression.checking.RegressionCase]

medcat.utils.regression.converting.medcat_export_json_to_regression_yml(mct_export_file, cont_sel=PerSentenceSelector(), model_card=None)

Extract regression test cases from a MedCATtrainer export yaml. This is done based on the context selector specified.

Parameters:

mct_export_file (str) – The MCT export file path
cont_sel (ContextSelector) – The context selector. Defaults to PerSentenceSelector().
model_card (Optional[dict]) – The optional model card for generating metadata

Returns:

str – Extracted regression cases in YAML form

Return type:

str

medcat.utils.regression.converting

Module Contents

Classes

Functions

Attributes

`medcat.utils.regression.converting`