`medcat.utils.regression.category_separation`

Module Contents

Classes

`CategoryDescription`	A descriptor for a category.
`Category`	The category base class.
`AllPartsCategory`	Represents a category which only fits a regression case if it matches all parts of category description.
`AnyPartOfCategory`	Represents a category which fits a regression case that matches any part of its category desription.
`SeparationObserver`	Keeps track of which case is separate into which category/categories.
`StrategyType`	Describes the types of strategies one can can employ for strategy.
`SeparatorStrategy`	The strategy according to which the separation takes place.
`SeparateToFirst`	Separator strategy that separates each case to its first match.
`SeparateToAll`	A separator strateg that allows separation to all matching categories.
`RegressionCheckerSeparator`	Regression checker separtor.

Functions

`get_random_str`([length])
`get_strategy`(strategy_type)	Get the separator strategy from the strategy type.
`get_separator`(categories, strategy_type[, ...])	Get the regression checker separator for the list of categories and the specified strategy.
`get_description`(cat_description)	Get the description from its dict representation.
`get_category`(cat_name, cat_description)	Get the category of the specified name from the dict.
`read_categories`(yaml_file)	Read categories from a YAML file.
`separate_categories`(category_yaml, strategy_type, ...)	Separate categories based on simple input.

Attributes

logger

medcat.utils.regression.category_separation.logger

class medcat.utils.regression.category_separation.CategoryDescription

Bases: pydantic.BaseModel

A descriptor for a category.

Parameters:

target_cuis (Set[str]) – The set of target CUIs
target_names (Set[str]) – The set of target names
target_tuis (Set[str]) – The set of target type IDs
anything_goes (bool) – Matches any CUI/NAME/TUI. Defaults to False

target_cuis: Set[str]

target_names: Set[str]

target_tuis: Set[str]

allow_everything: bool = False

_get_required_filter(case, target_filter)

Parameters:

case (medcat.utils.regression.checking.RegressionCase) –
target_filter (medcat.utils.regression.checking.FilterType) –

Return type:

Optional[medcat.utils.regression.checking.TypedFilter]

_has_specific_from(case, targets, target_filter)

Parameters:

case (medcat.utils.regression.checking.RegressionCase) –
targets (Set[str]) –
target_filter (medcat.utils.regression.checking.FilterType) –

has_cui_from(case)

Check if the description has a CUI from the specified regression case.

Parameters:: case (RegressionCase) – The regression case to check
Returns:: bool – True if the description has a CUI from the regression case
Return type:: bool

has_name_from(case)

Check if the description has a name from the specified regression case.

Parameters:: case (RegressionCase) – The regression case to check
Returns:: bool – True if the description has a name from the regression case
Return type:: bool

has_tui_from(case)

Check if the description has a target ID/TUI from the specified regression case.

Parameters:: case (RegressionCase) – The regression case to check
Returns:: bool – True if the description has a target ID/TUI from the regression case
Return type:: bool

__hash__()

Return type:: int

__eq__(other)

Parameters:: other (Any) –
Return type:: bool

classmethod anything_goes()

Return type:: CategoryDescription

class medcat.utils.regression.category_separation.Category(name)

Bases: abc.ABC

The category base class.

A category defines which regression cases fit in it.

Parameters:: name (str) – The name of the category

__init__(name)

Parameters:: name (str) –
Return type:: None

abstract fits(case)

Check if a particular regression case fits in this category.

Parameters:: case (RegressionCase) – The regression case.
Returns:: bool – Whether the case is in this category.
Return type:: bool

class medcat.utils.regression.category_separation.AllPartsCategory(name, descr)

Bases: Category

Represents a category which only fits a regression case if it matches all parts of category description.

That is, in order for a regression case to match, it would need to match a CUI, a name and a TUI specified in the category description.

Parameters:

name (str) – The name of the category
descr (CategoryDescription) – The description of the category

__init__(name, descr)

Parameters:

name (str) –
descr (CategoryDescription) –

Return type:

None

fits(case)

Check if a particular regression case fits in this category.

Parameters:: case (RegressionCase) – The regression case.
Returns:: bool – Whether the case is in this category.
Return type:: bool

__eq__(__o)

Return self==value.

Parameters:: __o (object) –
Return type:: bool

__hash__()

Return hash(self).

Return type:: int

__str__()

Return str(self).

Return type:: str

__repr__()

Return repr(self).

Return type:: str

class medcat.utils.regression.category_separation.AnyPartOfCategory(name, descr)

Bases: Category

Represents a category which fits a regression case that matches any part of its category desription.

That is, any case that matches either a CUI, a name or a TUI within the category description, will fit.

Parameters:

name (str) – The name of the category
descr (CategoryDescription) – The description of the category

__init__(name, descr)

Parameters:

name (str) –
descr (CategoryDescription) –

Return type:

None

fits(case)

Check if a particular regression case fits in this category.

Parameters:: case (RegressionCase) – The regression case.
Returns:: bool – Whether the case is in this category.
Return type:: bool

__eq__(__o)

Return self==value.

Parameters:: __o (object) –
Return type:: bool

__hash__()

Return hash(self).

Return type:: int

__str__()

Return str(self).

Return type:: str

__repr__()

Return repr(self).

Return type:: str

class medcat.utils.regression.category_separation.SeparationObserver

Keeps track of which case is separate into which category/categories.

It also keeps track of which cases have been observed as separated and into which category.

__init__()

Return type:: None

observe(case, category)

Observe the specified regression case in the specified category.

Parameters:

case (RegressionCase) – The regression case to observe
category (Category) – The category to link the case tos

Return type:

None

has_observed(case)

Check if the case has already been observed.

Parameters:: case (RegressionCase) – The case to check
Returns:: bool – True if the case had been observed, False otherwise
Return type:: bool

reset()

Allows resetting the state of the observer.

Return type:: None

class medcat.utils.regression.category_separation.StrategyType

Bases: enum.Enum

Describes the types of strategies one can can employ for strategy.

FIRST

ALL

class medcat.utils.regression.category_separation.SeparatorStrategy(observer)

Bases: abc.ABC

The strategy according to which the separation takes place.

The separation strategy relies on the mutable separation observer instance.

Parameters:: observer (SeparationObserver) –

__init__(observer)

Parameters:: observer (SeparationObserver) –
Return type:: None

abstract can_separate(case)

Check if the separator strategy can separate the specified regression case

Parameters:: case (RegressionCase) – The regression case to check
Returns:: bool – True if the strategy allows separation, False otherwise
Return type:: bool

abstract separate(case, category)

Separate the regression case

Parameters:

case (RegressionCase) – The regression case to separate
category (Category) – The category to separate to

Return type:

None

reset()

Allows resetting the state of the separator strategy.

Return type:: None

class medcat.utils.regression.category_separation.SeparateToFirst(observer)

Bases: SeparatorStrategy

Separator strategy that separates each case to its first match.

That is to say, any subsequently matching categories are ignored. This means that no regression case gets duplicated. It also means that the number of cases in all categories will be the same as the initial number of cases.

Parameters:: observer (SeparationObserver) –

can_separate(case)

Check if the separator strategy can separate the specified regression case

Parameters:: case (RegressionCase) – The regression case to check
Returns:: bool – True if the strategy allows separation, False otherwise
Return type:: bool

separate(case, category)

Separate the regression case

Parameters:

case (RegressionCase) – The regression case to separate
category (Category) – The category to separate to

Return type:

None

class medcat.utils.regression.category_separation.SeparateToAll(observer)

Bases: SeparatorStrategy

A separator strateg that allows separation to all matching categories.

This means that when one regression case fits into multiple categories, it will be saved in each such category. I.e the some cases may be duplicated.

Parameters:: observer (SeparationObserver) –

can_separate(case)

Check if the separator strategy can separate the specified regression case

Parameters:: case (RegressionCase) – The regression case to check
Returns:: bool – True if the strategy allows separation, False otherwise
Return type:: bool

separate(case, category)

Separate the regression case

Parameters:

case (RegressionCase) – The regression case to separate
category (Category) – The category to separate to

Return type:

None

medcat.utils.regression.category_separation.get_random_str(length=8)

class medcat.utils.regression.category_separation.RegressionCheckerSeparator

Bases: pydantic.BaseModel

Regression checker separtor.

It is able to separate cases in a regression checker into multiple different sets of regression cases based on the given list of categories and the specified strategy.

Parameters:

categories (List[Category]) – The categories to separate into
strategy (SeparatorStrategy) – The strategy for separation
overflow_category (bool) – Whether to use an overflow category for cases that don’t fit in other categoreis. Defaults to False.

class Config

arbitrary_types_allowed = True

categories: List[Category]

strategy: SeparatorStrategy

overflow_category: bool = False

_attempt_category_for(cat, case)

Parameters:

cat (Category) –
case (medcat.utils.regression.checking.RegressionCase) –

find_categories_for(case)

Find the categories for a specific regression case

Parameters:: case (RegressionCase) – The regression case to check
Raises:: ValueError – If no category found.

separate(checker)

Separate the specified regression checker into multiple sets of cases.

Each case may be associated with either no, one, or multiple categories. The specifics depends on allow_overflow and strategy.

Parameters:: checker (RegressionChecker) – The input regression checker
Return type:: None

save(prefix, metadata, overwrite=False)

Save the results of the separation in different files.

This needs to be called after the separate method has been called.

Each separated category (that has any cases registered to it) will be saved in a separate file with the specified predix and the category name.

Parameters:

prefix (str) – The prefix for the saved file(s)
metadata (MetaData) – The metadata for the regression suite
overwrite (bool) – Whether to overwrite file(s) if/when needed. Defaults to False.

Raises:

ValueError – If the method is called before separation or no separtion was done
ValueError – If a file already exists and is not allowed to be overwritten

Return type:

None

medcat.utils.regression.category_separation.get_strategy(strategy_type)

Get the separator strategy from the strategy type.

Parameters:: strategy_type (StrategyType) – The type of strategy
Raises:: ValueError – If an unknown strategy is provided
Returns:: SeparatorStrategy – The resulting separator strategys
Return type:: SeparatorStrategy

medcat.utils.regression.category_separation.get_separator(categories, strategy_type, overflow_category=False)

Get the regression checker separator for the list of categories and the specified strategy.

Parameters:

categories (List[Category]) – The list of categories to include
strategy_type (StrategyType) – The strategy for separation
overflow_category (bool) – Whether to use an overflow category for items that don’t go in other categories. Defaults to False.

Returns:

RegressionCheckerSeparator – The resulting separator

Return type:

RegressionCheckerSeparator

medcat.utils.regression.category_separation.get_description(cat_description)

Get the description from its dict representation.

The dict is expected to have the following keys: ‘cuis’, ‘tuis’, and ‘names’ Each one should have a list of strings as their values.

Parameters:: cat_description (dict) – The dict representation
Returns:: CategoryDescription – The resulting category description
Return type:: CategoryDescription

medcat.utils.regression.category_separation.get_category(cat_name, cat_description)

Get the category of the specified name from the dict.

The dict is expected to be in the form:: type: <category type> # either any or all cuis: [] # list of CUIs in category names: [] # list of names in category tuis: [] # list of type IDs in category

Parameters:

cat_name (str) – The name of the category
cat_description (dict) – The dict describing the category

Raises:

ValueError – If an unknown type is specified.

Returns:

Category – The resulting category

Return type:

medcat.utils.regression.category_separation

Module Contents

Classes

Functions

Attributes

`medcat.utils.regression.category_separation`