medcat.utils.regression.category_separation
Module Contents
Classes
A descriptor for a category. |
|
The category base class. |
|
Represents a category which only fits a regression case if it matches all parts of category description. |
|
Represents a category which fits a regression case that matches any part of its category desription. |
|
Keeps track of which case is separate into which category/categories. |
|
Describes the types of strategies one can can employ for strategy. |
|
The strategy according to which the separation takes place. |
|
Separator strategy that separates each case to its first match. |
|
A separator strateg that allows separation to all matching categories. |
|
Regression checker separtor. |
Functions
|
|
|
Get the separator strategy from the strategy type. |
|
Get the regression checker separator for the list of categories and the specified strategy. |
|
Get the description from its dict representation. |
|
Get the category of the specified name from the dict. |
|
Read categories from a YAML file. |
|
Separate categories based on simple input. |
Attributes
- medcat.utils.regression.category_separation.logger
- class medcat.utils.regression.category_separation.CategoryDescription
Bases:
pydantic.BaseModel
A descriptor for a category.
- Parameters:
target_cuis (Set[str]) – The set of target CUIs
target_names (Set[str]) – The set of target names
target_tuis (Set[str]) – The set of target type IDs
anything_goes (bool) – Matches any CUI/NAME/TUI. Defaults to False
- target_cuis: Set[str]
- target_names: Set[str]
- target_tuis: Set[str]
- allow_everything: bool = False
- _get_required_filter(case, target_filter)
- Parameters:
target_filter (medcat.utils.regression.checking.FilterType) –
- Return type:
Optional[medcat.utils.regression.checking.TypedFilter]
- _has_specific_from(case, targets, target_filter)
- Parameters:
targets (Set[str]) –
target_filter (medcat.utils.regression.checking.FilterType) –
- has_cui_from(case)
Check if the description has a CUI from the specified regression case.
- Parameters:
case (RegressionCase) – The regression case to check
- Returns:
bool – True if the description has a CUI from the regression case
- Return type:
bool
- has_name_from(case)
Check if the description has a name from the specified regression case.
- Parameters:
case (RegressionCase) – The regression case to check
- Returns:
bool – True if the description has a name from the regression case
- Return type:
bool
- has_tui_from(case)
Check if the description has a target ID/TUI from the specified regression case.
- Parameters:
case (RegressionCase) – The regression case to check
- Returns:
bool – True if the description has a target ID/TUI from the regression case
- Return type:
bool
- __hash__()
- Return type:
int
- __eq__(other)
- Parameters:
other (Any) –
- Return type:
bool
- classmethod anything_goes()
- Return type:
- class medcat.utils.regression.category_separation.Category(name)
Bases:
abc.ABC
The category base class.
A category defines which regression cases fit in it.
- Parameters:
name (str) – The name of the category
- __init__(name)
- Parameters:
name (str) –
- Return type:
None
- abstract fits(case)
Check if a particular regression case fits in this category.
- Parameters:
case (RegressionCase) – The regression case.
- Returns:
bool – Whether the case is in this category.
- Return type:
bool
- class medcat.utils.regression.category_separation.AllPartsCategory(name, descr)
Bases:
Category
Represents a category which only fits a regression case if it matches all parts of category description.
That is, in order for a regression case to match, it would need to match a CUI, a name and a TUI specified in the category description.
- Parameters:
name (str) – The name of the category
descr (CategoryDescription) – The description of the category
- __init__(name, descr)
- Parameters:
name (str) –
descr (CategoryDescription) –
- Return type:
None
- fits(case)
Check if a particular regression case fits in this category.
- Parameters:
case (RegressionCase) – The regression case.
- Returns:
bool – Whether the case is in this category.
- Return type:
bool
- __eq__(__o)
Return self==value.
- Parameters:
__o (object) –
- Return type:
bool
- __hash__()
Return hash(self).
- Return type:
int
- __str__()
Return str(self).
- Return type:
str
- __repr__()
Return repr(self).
- Return type:
str
- class medcat.utils.regression.category_separation.AnyPartOfCategory(name, descr)
Bases:
Category
Represents a category which fits a regression case that matches any part of its category desription.
That is, any case that matches either a CUI, a name or a TUI within the category description, will fit.
- Parameters:
name (str) – The name of the category
descr (CategoryDescription) – The description of the category
- __init__(name, descr)
- Parameters:
name (str) –
descr (CategoryDescription) –
- Return type:
None
- fits(case)
Check if a particular regression case fits in this category.
- Parameters:
case (RegressionCase) – The regression case.
- Returns:
bool – Whether the case is in this category.
- Return type:
bool
- __eq__(__o)
Return self==value.
- Parameters:
__o (object) –
- Return type:
bool
- __hash__()
Return hash(self).
- Return type:
int
- __str__()
Return str(self).
- Return type:
str
- __repr__()
Return repr(self).
- Return type:
str
- class medcat.utils.regression.category_separation.SeparationObserver
Keeps track of which case is separate into which category/categories.
It also keeps track of which cases have been observed as separated and into which category.
- __init__()
- Return type:
None
- observe(case, category)
Observe the specified regression case in the specified category.
- Parameters:
case (RegressionCase) – The regression case to observe
category (Category) – The category to link the case tos
- Return type:
None
- has_observed(case)
Check if the case has already been observed.
- Parameters:
case (RegressionCase) – The case to check
- Returns:
bool – True if the case had been observed, False otherwise
- Return type:
bool
- reset()
Allows resetting the state of the observer.
- Return type:
None
- class medcat.utils.regression.category_separation.StrategyType
Bases:
enum.Enum
Describes the types of strategies one can can employ for strategy.
- FIRST
- ALL
- class medcat.utils.regression.category_separation.SeparatorStrategy(observer)
Bases:
abc.ABC
The strategy according to which the separation takes place.
The separation strategy relies on the mutable separation observer instance.
- Parameters:
observer (SeparationObserver) –
- __init__(observer)
- Parameters:
observer (SeparationObserver) –
- Return type:
None
- abstract can_separate(case)
Check if the separator strategy can separate the specified regression case
- Parameters:
case (RegressionCase) – The regression case to check
- Returns:
bool – True if the strategy allows separation, False otherwise
- Return type:
bool
- abstract separate(case, category)
Separate the regression case
- Parameters:
case (RegressionCase) – The regression case to separate
category (Category) – The category to separate to
- Return type:
None
- reset()
Allows resetting the state of the separator strategy.
- Return type:
None
- class medcat.utils.regression.category_separation.SeparateToFirst(observer)
Bases:
SeparatorStrategy
Separator strategy that separates each case to its first match.
That is to say, any subsequently matching categories are ignored. This means that no regression case gets duplicated. It also means that the number of cases in all categories will be the same as the initial number of cases.
- Parameters:
observer (SeparationObserver) –
- can_separate(case)
Check if the separator strategy can separate the specified regression case
- Parameters:
case (RegressionCase) – The regression case to check
- Returns:
bool – True if the strategy allows separation, False otherwise
- Return type:
bool
- separate(case, category)
Separate the regression case
- Parameters:
case (RegressionCase) – The regression case to separate
category (Category) – The category to separate to
- Return type:
None
- class medcat.utils.regression.category_separation.SeparateToAll(observer)
Bases:
SeparatorStrategy
A separator strateg that allows separation to all matching categories.
This means that when one regression case fits into multiple categories, it will be saved in each such category. I.e the some cases may be duplicated.
- Parameters:
observer (SeparationObserver) –
- can_separate(case)
Check if the separator strategy can separate the specified regression case
- Parameters:
case (RegressionCase) – The regression case to check
- Returns:
bool – True if the strategy allows separation, False otherwise
- Return type:
bool
- separate(case, category)
Separate the regression case
- Parameters:
case (RegressionCase) – The regression case to separate
category (Category) – The category to separate to
- Return type:
None
- medcat.utils.regression.category_separation.get_random_str(length=8)
- class medcat.utils.regression.category_separation.RegressionCheckerSeparator
Bases:
pydantic.BaseModel
Regression checker separtor.
It is able to separate cases in a regression checker into multiple different sets of regression cases based on the given list of categories and the specified strategy.
- Parameters:
categories (List[Category]) – The categories to separate into
strategy (SeparatorStrategy) – The strategy for separation
overflow_category (bool) – Whether to use an overflow category for cases that don’t fit in other categoreis. Defaults to False.
- strategy: SeparatorStrategy
- overflow_category: bool = False
- find_categories_for(case)
Find the categories for a specific regression case
- Parameters:
case (RegressionCase) – The regression case to check
- Raises:
ValueError – If no category found.
- separate(checker)
Separate the specified regression checker into multiple sets of cases.
Each case may be associated with either no, one, or multiple categories. The specifics depends on allow_overflow and strategy.
- Parameters:
checker (RegressionChecker) – The input regression checker
- Return type:
None
- save(prefix, metadata, overwrite=False)
Save the results of the separation in different files.
This needs to be called after the separate method has been called.
Each separated category (that has any cases registered to it) will be saved in a separate file with the specified predix and the category name.
- Parameters:
prefix (str) – The prefix for the saved file(s)
metadata (MetaData) – The metadata for the regression suite
overwrite (bool) – Whether to overwrite file(s) if/when needed. Defaults to False.
- Raises:
ValueError – If the method is called before separation or no separtion was done
ValueError – If a file already exists and is not allowed to be overwritten
- Return type:
None
- medcat.utils.regression.category_separation.get_strategy(strategy_type)
Get the separator strategy from the strategy type.
- Parameters:
strategy_type (StrategyType) – The type of strategy
- Raises:
ValueError – If an unknown strategy is provided
- Returns:
SeparatorStrategy – The resulting separator strategys
- Return type:
- medcat.utils.regression.category_separation.get_separator(categories, strategy_type, overflow_category=False)
Get the regression checker separator for the list of categories and the specified strategy.
- Parameters:
categories (List[Category]) – The list of categories to include
strategy_type (StrategyType) – The strategy for separation
overflow_category (bool) – Whether to use an overflow category for items that don’t go in other categories. Defaults to False.
- Returns:
RegressionCheckerSeparator – The resulting separator
- Return type:
- medcat.utils.regression.category_separation.get_description(cat_description)
Get the description from its dict representation.
The dict is expected to have the following keys: ‘cuis’, ‘tuis’, and ‘names’ Each one should have a list of strings as their values.
- Parameters:
cat_description (dict) – The dict representation
- Returns:
CategoryDescription – The resulting category description
- Return type:
- medcat.utils.regression.category_separation.get_category(cat_name, cat_description)
Get the category of the specified name from the dict.
- The dict is expected to be in the form:
type: <category type> # either any or all cuis: [] # list of CUIs in category names: [] # list of names in category tuis: [] # list of type IDs in category
- Parameters:
cat_name (str) – The name of the category
cat_description (dict) – The dict describing the category
- Raises:
ValueError – If an unknown type is specified.
- Returns:
Category – The resulting category
- Return type:
- medcat.utils.regression.category_separation.read_categories(yaml_file)
Read categories from a YAML file.
The yaml is assumed to be in the format: categories:
- category-name:
type: <category type> cuis: [<target cui 1>, <target cui 2>, …] names: [<target name 1>, <target name 2>, …] tuis: [<target tui 1>, <target tui 2>, …]
- other-category-name:
… # and so on
- Parameters:
yaml_file (str) – The yaml file location
- Returns:
List[Category] – The resulting categories
- Return type:
List[Category]
- medcat.utils.regression.category_separation.separate_categories(category_yaml, strategy_type, regression_suite_yaml, target_file_prefix, overwrite=False, overflow_category=False)
Separate categories based on simple input.
The categories are read from the provided file and the regression suite from its corresponding yaml. The separated regression suites are saved in accordance to the defined prefix.
- Parameters:
category_yaml (str) – The name of the YAML file describing the categories
strategy_type (StrategyType) – The strategy for separation
regression_suite_yaml (str) – The regression suite YAML
target_file_prefix (str) – The target file prefix
overwrite (bool) – Whether to overwrite file(s) if/when needed. Defaults to False.
overflow_category (bool) – Whether to use an overflow category for items that don’t go in other categories. Defaults to False.
- Return type:
None