medcat.utils.regression.converting
Module Contents
Classes
Describes how the context of a concept is found. |
|
Context selector that selects a number of words |
|
Context selector that selects a sentence as context. |
|
Used to preserver unique names in a set |
Functions
|
Get a case that matches a set of filters (if one exists) from within a list. |
|
Extract regression test cases from a MedCATtrainer export yaml. |
Attributes
- medcat.utils.regression.converting.logger
- class medcat.utils.regression.converting.ContextSelector
Bases:
abc.ABC
Describes how the context of a concept is found. A sub-class should be used as this one has no implementation.
- _splitter(text)
- Parameters:
text (str) –
- Return type:
List[str]
- make_replace_safe(text)
Make the text replace-safe. That is, wrap all ‘%’ as ‘%%’ so that the text % replacement syntax can be used for an inserted part (and that part only).
- Parameters:
text (str) – The text to use
- Returns:
str – The replace-safe text
- Return type:
str
- abstract get_context(text, start, end, leave_concept=False)
Get the context of a concept within a larger body of text. The concept is specifiedb by its start and end indices.
- Parameters:
text (str) – The larger text
start (int) – The starting index
end (int) – The ending index
leave_concept (bool) – Whether to leave the concept or replace it by ‘%s’. Defaults to False
- Returns:
str – The select contexts
- Return type:
str
- class medcat.utils.regression.converting.PerWordContextSelector(words_before, words_after)
Bases:
ContextSelector
Context selector that selects a number of words from either side of the concept, regardless of punctuation.
- Parameters:
words_before (int) – Number of words to select from before concept
words_after (int) – Number of words to select from after concepts
- __init__(words_before, words_after)
- Parameters:
words_before (int) –
words_after (int) –
- Return type:
None
- get_context(text, start, end, leave_concept=False)
Get the context of a concept within a larger body of text. The concept is specifiedb by its start and end indices.
- Parameters:
text (str) – The larger text
start (int) – The starting index
end (int) – The ending index
leave_concept (bool) – Whether to leave the concept or replace it by ‘%s’. Defaults to False
- Returns:
str – The select contexts
- Return type:
str
- class medcat.utils.regression.converting.PerSentenceSelector
Bases:
ContextSelector
Context selector that selects a sentence as context. Sentences are said to end with either “.”, “?” or “!”.
- stoppers = '\\.+|\\?+|!+'
- get_context(text, start, end, leave_concept=False)
Get the context of a concept within a larger body of text. The concept is specifiedb by its start and end indices.
- Parameters:
text (str) – The larger text
start (int) – The starting index
end (int) – The ending index
leave_concept (bool) – Whether to leave the concept or replace it by ‘%s’. Defaults to False
- Returns:
str – The select contexts
- Return type:
str
- class medcat.utils.regression.converting.UniqueNamePreserver
Used to preserver unique names in a set
- __init__()
- Return type:
None
- name2nrgen(name, nr)
The method to generate name and copy-number combinations.
- Parameters:
name (str) – The base name
nr (int) – The number of the copy
- Returns:
str – The combined name
- Return type:
str
- get_unique_name(orig_name, dupe_nr=0)
Get the unique name of dupe number (at least) as high as specified.
- Parameters:
orig_name (str) – The original / base name
dupe_nr (int) – The number of the copy to start from. Defaults to 0.
- Returns:
str – The unique name
- Return type:
str
- medcat.utils.regression.converting.get_matching_case(cases, filters)
Get a case that matches a set of filters (if one exists) from within a list.
- Parameters:
cases (List[RegressionCase]) – The list to look in
filters (List[TypedFilter]) – The filters to compare to
- Returns:
Optional[RegressionCase] – The regression case (if found) or None
- Return type:
- medcat.utils.regression.converting.medcat_export_json_to_regression_yml(mct_export_file, cont_sel=PerSentenceSelector(), model_card=None)
Extract regression test cases from a MedCATtrainer export yaml. This is done based on the context selector specified.
- Parameters:
mct_export_file (str) – The MCT export file path
cont_sel (ContextSelector) – The context selector. Defaults to PerSentenceSelector().
model_card (Optional[dict]) – The optional model card for generating metadata
- Returns:
str – Extracted regression cases in YAML form
- Return type:
str