:py:mod:`medcat.utils.regression.category_separation` ===================================================== .. py:module:: medcat.utils.regression.category_separation Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.regression.category_separation.CategoryDescription medcat.utils.regression.category_separation.Category medcat.utils.regression.category_separation.AllPartsCategory medcat.utils.regression.category_separation.AnyPartOfCategory medcat.utils.regression.category_separation.SeparationObserver medcat.utils.regression.category_separation.StrategyType medcat.utils.regression.category_separation.SeparatorStrategy medcat.utils.regression.category_separation.SeparateToFirst medcat.utils.regression.category_separation.SeparateToAll medcat.utils.regression.category_separation.RegressionCheckerSeparator Functions ~~~~~~~~~ .. autoapisummary:: medcat.utils.regression.category_separation.get_random_str medcat.utils.regression.category_separation.get_strategy medcat.utils.regression.category_separation.get_separator medcat.utils.regression.category_separation.get_description medcat.utils.regression.category_separation.get_category medcat.utils.regression.category_separation.read_categories medcat.utils.regression.category_separation.separate_categories Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.regression.category_separation.logger .. py:data:: logger .. py:class:: CategoryDescription Bases: :py:obj:`pydantic.BaseModel` A descriptor for a category. :param target_cuis: The set of target CUIs :type target_cuis: Set[str] :param target_names: The set of target names :type target_names: Set[str] :param target_tuis: The set of target type IDs :type target_tuis: Set[str] :param anything_goes: Matches any CUI/NAME/TUI. Defaults to False :type anything_goes: bool .. py:attribute:: target_cuis :type: Set[str] .. py:attribute:: target_names :type: Set[str] .. py:attribute:: target_tuis :type: Set[str] .. py:attribute:: allow_everything :type: bool :value: False .. py:method:: _get_required_filter(case, target_filter) .. py:method:: _has_specific_from(case, targets, target_filter) .. py:method:: has_cui_from(case) Check if the description has a CUI from the specified regression case. :param case: The regression case to check :type case: RegressionCase :Returns: **bool** -- True if the description has a CUI from the regression case .. py:method:: has_name_from(case) Check if the description has a name from the specified regression case. :param case: The regression case to check :type case: RegressionCase :Returns: **bool** -- True if the description has a name from the regression case .. py:method:: has_tui_from(case) Check if the description has a target ID/TUI from the specified regression case. :param case: The regression case to check :type case: RegressionCase :Returns: **bool** -- True if the description has a target ID/TUI from the regression case .. py:method:: __hash__() .. py:method:: __eq__(other) .. py:method:: anything_goes() :classmethod: .. py:class:: Category(name) Bases: :py:obj:`abc.ABC` The category base class. A category defines which regression cases fit in it. :param name: The name of the category :type name: str .. py:method:: __init__(name) .. py:method:: fits(case) :abstractmethod: Check if a particular regression case fits in this category. :param case: The regression case. :type case: RegressionCase :Returns: **bool** -- Whether the case is in this category. .. py:class:: AllPartsCategory(name, descr) Bases: :py:obj:`Category` Represents a category which only fits a regression case if it matches all parts of category description. That is, in order for a regression case to match, it would need to match a CUI, a name and a TUI specified in the category description. :param name: The name of the category :type name: str :param descr: The description of the category :type descr: CategoryDescription .. py:method:: __init__(name, descr) .. py:method:: fits(case) Check if a particular regression case fits in this category. :param case: The regression case. :type case: RegressionCase :Returns: **bool** -- Whether the case is in this category. .. py:method:: __eq__(__o) Return self==value. .. py:method:: __hash__() Return hash(self). .. py:method:: __str__() Return str(self). .. py:method:: __repr__() Return repr(self). .. py:class:: AnyPartOfCategory(name, descr) Bases: :py:obj:`Category` Represents a category which fits a regression case that matches any part of its category desription. That is, any case that matches either a CUI, a name or a TUI within the category description, will fit. :param name: The name of the category :type name: str :param descr: The description of the category :type descr: CategoryDescription .. py:method:: __init__(name, descr) .. py:method:: fits(case) Check if a particular regression case fits in this category. :param case: The regression case. :type case: RegressionCase :Returns: **bool** -- Whether the case is in this category. .. py:method:: __eq__(__o) Return self==value. .. py:method:: __hash__() Return hash(self). .. py:method:: __str__() Return str(self). .. py:method:: __repr__() Return repr(self). .. py:class:: SeparationObserver Keeps track of which case is separate into which category/categories. It also keeps track of which cases have been observed as separated and into which category. .. py:method:: __init__() .. py:method:: observe(case, category) Observe the specified regression case in the specified category. :param case: The regression case to observe :type case: RegressionCase :param category: The category to link the case tos :type category: Category .. py:method:: has_observed(case) Check if the case has already been observed. :param case: The case to check :type case: RegressionCase :Returns: **bool** -- True if the case had been observed, False otherwise .. py:method:: reset() Allows resetting the state of the observer. .. py:class:: StrategyType Bases: :py:obj:`enum.Enum` Describes the types of strategies one can can employ for strategy. .. py:attribute:: FIRST .. py:attribute:: ALL .. py:class:: SeparatorStrategy(observer) Bases: :py:obj:`abc.ABC` The strategy according to which the separation takes place. The separation strategy relies on the mutable separation observer instance. .. py:method:: __init__(observer) .. py:method:: can_separate(case) :abstractmethod: Check if the separator strategy can separate the specified regression case :param case: The regression case to check :type case: RegressionCase :Returns: **bool** -- True if the strategy allows separation, False otherwise .. py:method:: separate(case, category) :abstractmethod: Separate the regression case :param case: The regression case to separate :type case: RegressionCase :param category: The category to separate to :type category: Category .. py:method:: reset() Allows resetting the state of the separator strategy. .. py:class:: SeparateToFirst(observer) Bases: :py:obj:`SeparatorStrategy` Separator strategy that separates each case to its first match. That is to say, any subsequently matching categories are ignored. This means that no regression case gets duplicated. It also means that the number of cases in all categories will be the same as the initial number of cases. .. py:method:: can_separate(case) Check if the separator strategy can separate the specified regression case :param case: The regression case to check :type case: RegressionCase :Returns: **bool** -- True if the strategy allows separation, False otherwise .. py:method:: separate(case, category) Separate the regression case :param case: The regression case to separate :type case: RegressionCase :param category: The category to separate to :type category: Category .. py:class:: SeparateToAll(observer) Bases: :py:obj:`SeparatorStrategy` A separator strateg that allows separation to all matching categories. This means that when one regression case fits into multiple categories, it will be saved in each such category. I.e the some cases may be duplicated. .. py:method:: can_separate(case) Check if the separator strategy can separate the specified regression case :param case: The regression case to check :type case: RegressionCase :Returns: **bool** -- True if the strategy allows separation, False otherwise .. py:method:: separate(case, category) Separate the regression case :param case: The regression case to separate :type case: RegressionCase :param category: The category to separate to :type category: Category .. py:function:: get_random_str(length=8) .. py:class:: RegressionCheckerSeparator Bases: :py:obj:`pydantic.BaseModel` Regression checker separtor. It is able to separate cases in a regression checker into multiple different sets of regression cases based on the given list of categories and the specified strategy. :param categories: The categories to separate into :type categories: List[Category] :param strategy: The strategy for separation :type strategy: SeparatorStrategy :param overflow_category: Whether to use an overflow category for cases that don't fit in other categoreis. Defaults to False. :type overflow_category: bool .. py:class:: Config .. py:attribute:: arbitrary_types_allowed :value: True .. py:attribute:: categories :type: List[Category] .. py:attribute:: strategy :type: SeparatorStrategy .. py:attribute:: overflow_category :type: bool :value: False .. py:method:: _attempt_category_for(cat, case) .. py:method:: find_categories_for(case) Find the categories for a specific regression case :param case: The regression case to check :type case: RegressionCase :raises ValueError: If no category found. .. py:method:: separate(checker) Separate the specified regression checker into multiple sets of cases. Each case may be associated with either no, one, or multiple categories. The specifics depends on `allow_overflow` and `strategy`. :param checker: The input regression checker :type checker: RegressionChecker .. py:method:: save(prefix, metadata, overwrite = False) Save the results of the separation in different files. This needs to be called after the `separate` method has been called. Each separated category (that has any cases registered to it) will be saved in a separate file with the specified predix and the category name. :param prefix: The prefix for the saved file(s) :type prefix: str :param metadata: The metadata for the regression suite :type metadata: MetaData :param overwrite: Whether to overwrite file(s) if/when needed. Defaults to False. :type overwrite: bool :raises ValueError: If the method is called before separation or no separtion was done :raises ValueError: If a file already exists and is not allowed to be overwritten .. py:function:: get_strategy(strategy_type) Get the separator strategy from the strategy type. :param strategy_type: The type of strategy :type strategy_type: StrategyType :raises ValueError: If an unknown strategy is provided :Returns: **SeparatorStrategy** -- The resulting separator strategys .. py:function:: get_separator(categories, strategy_type, overflow_category = False) Get the regression checker separator for the list of categories and the specified strategy. :param categories: The list of categories to include :type categories: List[Category] :param strategy_type: The strategy for separation :type strategy_type: StrategyType :param overflow_category: Whether to use an overflow category for items that don't go in other categories. Defaults to False. :type overflow_category: bool :Returns: **RegressionCheckerSeparator** -- The resulting separator .. py:function:: get_description(cat_description) Get the description from its dict representation. The dict is expected to have the following keys: 'cuis', 'tuis', and 'names' Each one should have a list of strings as their values. :param cat_description: The dict representation :type cat_description: dict :Returns: **CategoryDescription** -- The resulting category description .. py:function:: get_category(cat_name, cat_description) Get the category of the specified name from the dict. The dict is expected to be in the form: type: # either any or all cuis: [] # list of CUIs in category names: [] # list of names in category tuis: [] # list of type IDs in category :param cat_name: The name of the category :type cat_name: str :param cat_description: The dict describing the category :type cat_description: dict :raises ValueError: If an unknown type is specified. :Returns: **Category** -- The resulting category .. py:function:: read_categories(yaml_file) Read categories from a YAML file. The yaml is assumed to be in the format: categories: category-name: type: cuis: [, , ...] names: [, , ...] tuis: [, , ...] other-category-name: ... # and so on :param yaml_file: The yaml file location :type yaml_file: str :Returns: **List[Category]** -- The resulting categories .. py:function:: separate_categories(category_yaml, strategy_type, regression_suite_yaml, target_file_prefix, overwrite = False, overflow_category = False) Separate categories based on simple input. The categories are read from the provided file and the regression suite from its corresponding yaml. The separated regression suites are saved in accordance to the defined prefix. :param category_yaml: The name of the YAML file describing the categories :type category_yaml: str :param strategy_type: The strategy for separation :type strategy_type: StrategyType :param regression_suite_yaml: The regression suite YAML :type regression_suite_yaml: str :param target_file_prefix: The target file prefix :type target_file_prefix: str :param overwrite: Whether to overwrite file(s) if/when needed. Defaults to False. :type overwrite: bool :param overflow_category: Whether to use an overflow category for items that don't go in other categories. Defaults to False. :type overflow_category: bool