:py:mod:`medcat.utils.regression.results` ========================================= .. py:module:: medcat.utils.regression.results Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.regression.results.Finding medcat.utils.regression.results.FindingDeterminer medcat.utils.regression.results.Strictness medcat.utils.regression.results.SingleResultDescriptor medcat.utils.regression.results.ResultDescriptor medcat.utils.regression.results.MultiDescriptor Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.regression.results.STRICTNESS_MATRIX .. py:class:: Finding Bases: :py:obj:`enum.Enum` Describes whether or how the finding verified. The idea is that we know where we expect the entity to be recognised and the enum constants describe how the recognition compared to the expectation. In essence, we want to know the relative positions of the two pairs of numbers (character numbers): - Expected Start, Expected End - Recognised Start, Recognised End We can model this as 4 numbers on the number line. And we want to know their position relative to each other. For example, if the expected positions are marked with * and recognised positions with #, we may have something like: ___*__#_______#*______________ Which would indicate that there is a partial, but smaller span recognised. .. py:attribute:: IDENTICAL The CUI and the span recognised are identical to what was expected. .. py:attribute:: BIGGER_SPAN_RIGHT The CUI is the same, but the recognised span is longer on the right. If we use the notation from the class doc string, e.g: _*#__*__# .. py:attribute:: BIGGER_SPAN_LEFT The CUI is the same, but the recognised span is longer on the left. If we use the notation from the class doc string, e.g: _#_*__*#_ .. py:attribute:: BIGGER_SPAN_BOTH The CUI is the same, but the recognised span is longer on both sides. If we use the notation from the class doc string, e.g: _#__*__*__#_ .. py:attribute:: SMALLER_SPAN The CUI is the same, but the recognised span is smaller. If we use the notation from the class doc string, e.g: _*_#_#_*_ (neither start nor end match) _*#_#_*__ (start matches, but end is before expected) _*__#_#*_ (end matches, but start is after expected) .. py:attribute:: PARTIAL_OVERLAP The CUI is the same, but the span overlaps partially. If we use the notation from the class doc string, e.g: _*_#__*_#_ (starts between expected start and end, but ends beyond) _#_*_#_*__ (start before expected start, but ends between expected start and end) .. py:attribute:: FOUND_DIR_PARENT The recognised CUI is a parent of the expected CUI but the span is an exact match. .. py:attribute:: FOUND_DIR_GRANDPARENT The recognised CUI is a grandparent of the expected CUI but the span is an exact match. .. py:attribute:: FOUND_ANY_CHILD The recognised CUI is a child of the expected CUI but the span is an exact match. .. py:attribute:: FOUND_CHILD_PARTIAL The recognised CUI is a child yet the match is only partial (smaller/bigger/partial). .. py:attribute:: FOUND_OTHER Found another CUI in the same span. .. py:attribute:: FAIL The concept was not recognised in any meaningful way. .. py:method:: has_correct_cui() Whether the finding found the correct concept. :Returns: **bool** -- Whether the correct concept was found. .. py:method:: determine(exp_cui, exp_start, exp_end, tl, found_entities, strict_only = False, check_children = True, check_parent = True, check_grandparent = True) :classmethod: Determine the finding type based on the input :param exp_cui: Expected CUI. :type exp_cui: str :param exp_start: Expected span start. :type exp_start: int :param exp_end: Expected span end. :type exp_end: int :param tl: The translation layer. :type tl: TranslationLayer :param found_entities: The entities found by the model. :type found_entities: Dict[str, Dict[str, Any]] :param strict_only: Whether to use a strict-only mode (either identical or fail). Defaults to False. :type strict_only: bool :param check_children: Whether to check the children. Defaults to True. :type check_children: bool :param check_parent: Whether to check for parent(s). Defaults to True. :type check_parent: bool :param check_grandparent: Whether to check for grandparent(s). Defaults to True. :type check_grandparent: bool :Returns: **Tuple['Finding', Optional[str]]** -- The type of finding determined, and the alternative. .. py:class:: FindingDeterminer(exp_cui, exp_start, exp_end, tl, found_entities, strict_only = False, check_children = True, check_parent = True, check_grandparent = True) A helper class to determine the type of finding. This is mostly useful to split the responsibilities of looking at children/parents as well as to keep track of the already-checked children to avoid infinite recursion (which could happen in - e.g - a SNOMED model). :param exp_cui: The expected CUI. :type exp_cui: str :param exp_start: The expected span start. :type exp_start: int :param exp_end: The expected span end. :type exp_end: int :param tl: The translation layer. :type tl: TranslationLayer :param found_entities: The entities found by the model. :type found_entities: Dict[str, Dict[str, Any]] :param strict_only: Whether to use strict-only mode (either identical or fail). Defaults to False. :type strict_only: bool :param check_children: Whether or not to check the children. Defaults to True. :type check_children: bool :param check_parent: Whether to check for parent(s). Defaults to True. :type check_parent: bool :param check_grandparent: Whether to check for granparent(s). Defaults to True. :type check_grandparent: bool .. py:method:: __init__(exp_cui, exp_start, exp_end, tl, found_entities, strict_only = False, check_children = True, check_parent = True, check_grandparent = True) .. py:method:: _determine_raw(start, end) Determines the raw SPAN-ONLY finding. I.e this assumes the concept is appropriate. It will return None if there is no overlapping span. :param start: The start of the span. :type start: int :param end: The end of the span. :type end: int :raises MalformedFinding: If the start is greater than the end. :raises MalformedFinding: If the expected start is greater than the expected end. :Returns: **Optional[Finding]** -- The finding, if a match is found. .. py:method:: _get_strict() .. py:method:: _check_parents() .. py:method:: _check_children() .. py:method:: _descr_cui(cui) .. py:method:: _find_diff_cui() .. py:method:: determine() Determine the finding based on the given information. First, the strict check is done (either identical or not). Then, parents are checked (if required). After that, children are checked (if required). :Returns: **Tuple[Finding, Optional[str]]** -- The appropriate finding, and the alternative (if applicable). .. py:method:: _determine() .. py:class:: Strictness Bases: :py:obj:`enum.Enum` The total strictness on which to judge the results. .. py:attribute:: STRICTEST The strictest option which only allows identical findings. .. py:attribute:: STRICT A strict option which allows identical or children. .. py:attribute:: NORMAL Normal strictness also allows partial overlaps on target concept and children. .. py:attribute:: LENIENT Lenient stictness also allows parents and grandparents. .. py:attribute:: ANYTHING Anything stricness allows ANY finding. This would generally only be relevant when disabling examples for results descriptors. .. py:data:: STRICTNESS_MATRIX :type: Dict[Strictness, Set[Finding]] .. py:class:: SingleResultDescriptor(/, **data) Bases: :py:obj:`pydantic.BaseModel` The result descriptor. This class is responsible for keeping track of all the findings (i.e how many were found to be identical) as well as the examples of the finding on a per-target basis for further analysis. .. py:attribute:: name :type: str The name of the part that was checked .. py:attribute:: findings :type: Dict[Finding, int] The description of failures .. py:attribute:: examples :type: List[Tuple[medcat.utils.regression.targeting.FinalTarget, Tuple[Finding, Optional[str]]]] :value: [] The examples of non-perfect alignment. .. py:method:: report_success(target, found) Report a test case and its successfulness. :param target: The target configuration :type target: FinalTarget :param found: Whether or not the check was successful :type found: Tuple[Finding, Optional[str]] .. py:method:: get_report() Get the report associated with this descriptor :Returns: **str** -- The report string .. py:method:: model_dump(**kwargs) Usage docs: https://docs.pydantic.dev/2.10/concepts/serialization/#modelmodel_dump Generate a dictionary representation of the model, optionally specifying which fields to include or exclude. :param mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. :param include: A set of fields to include in the output. :param exclude: A set of fields to exclude from the output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to use the field's alias in the dictionary key if defined. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :Returns: **A dictionary representation of the model.** .. py:method:: json(**kwargs) .. py:class:: ResultDescriptor(/, **data) Bases: :py:obj:`SingleResultDescriptor` The overarching result descriptor that handles multiple phrases. This class keeps track of the results on a per-phrase basis and can be used to get the overall report and/or iterate over examples. .. py:attribute:: per_phrase_results :type: Dict[str, SingleResultDescriptor] .. py:method:: report(target, finding) Report a test case and its successfulness :param target: The final targe configuration :type target: FinalTarget :param finding: To what extent the concept was recognised :type finding: Tuple[Finding, Optional[str]] .. py:method:: iter_examples(strictness_threshold) Iterate suitable examples. The strictness threshold at which to include examples. Any finding that is assumed to be "correct enough" according to the strictness matrix for this threshold will be withheld from examples. In simpler terms, if the finding is NOT in the strictness matrix for this strictness, the example is recorded. NOTE: To disable example keeping, set the threshold to Strictness.ANYTHING. :param strictness_threshold: The strictness threshold. :type strictness_threshold: Strictness :Yields: *Iterable[Tuple[FinalTarget, Tuple[Finding, Optional[str]]]]* -- The placeholder, phrase, finding, CUI, and name. .. py:method:: get_report(phrases_separately = False) Get the report associated with this descriptor :param phrases_separately: Whether to output descriptor for each phrase separately :type phrases_separately: bool :Returns: **str** -- The report string .. py:method:: model_dump(**kwargs) Usage docs: https://docs.pydantic.dev/2.10/concepts/serialization/#modelmodel_dump Generate a dictionary representation of the model, optionally specifying which fields to include or exclude. :param mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. :param include: A set of fields to include in the output. :param exclude: A set of fields to exclude from the output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to use the field's alias in the dictionary key if defined. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :Returns: **A dictionary representation of the model.** .. py:class:: MultiDescriptor(/, **data) Bases: :py:obj:`pydantic.BaseModel` The descriptor of results over multiple different results (parts). The idea is that this would likely be used with a regression suite and it would incorporate all the different regression cases it describes. .. py:property:: findings :type: Dict[Finding, int] The total findings. :Returns: **Dict[Finding, int]** -- The total number of successes. .. py:attribute:: name :type: str The name of the collection being checked .. py:attribute:: parts :type: List[ResultDescriptor] :value: [] The parts kept track of .. py:method:: iter_examples(strictness_threshold) Iterate over all relevant examples. Only examples that are not in the strictness matrix for the specified threshold will be used. :param strictness_threshold: The threshold of avoidance. :type strictness_threshold: Strictness :Yields: *Iterable[Tuple[FinalTarget, Tuple[Finding, Optional[str]]]]* -- The examples .. py:method:: _get_part_report(part, allowed_findings, total_findings, hide_empty, examples_strictness, phrases_separately, phrase_max_len) .. py:method:: calculate_report(phrases_separately = False, hide_empty = False, examples_strictness = Strictness.STRICTEST, strictness = Strictness.NORMAL, phrase_max_len = 80) Calculate some of the major parts of the report. :param phrases_separately: Whether to include per-phrase information :type phrases_separately: bool :param hide_empty: Whether to hide empty cases :type hide_empty: bool :param examples_strictness: What level of strictness to show for examples. Set to None to disable examples. Defaults to Strictness.STRICTEST. :type examples_strictness: Optional[Strictness.STRICTEST] :param strictness: The strictness of the success / fail overview. Defaults to Strictness.NORMAL. :type strictness: Strictness :param phrase_max_len: The maximum length of the phrase in examples. Defaults to 80. :type phrase_max_len: int :Returns: **Tuple[int, int, int, int, str]** -- The total number of examples, the total successes, the total failures, the delegated part, and the number of empty .. py:method:: get_report(phrases_separately, hide_empty = False, examples_strictness = Strictness.STRICTEST, strictness = Strictness.NORMAL, phrase_max_len = 80) Get the report associated with this descriptor :param phrases_separately: Whether to include per-phrase information :type phrases_separately: bool :param hide_empty: Whether to hide empty cases :type hide_empty: bool :param examples_strictness: What level of strictness to show for examples. Set to None to disable examples. Defaults to Strictness.STRICTEST. :type examples_strictness: Optional[Strictness.STRICTEST] :param strictness: The strictness of the success / fail overview. Defaults to Strictness.NORMAL. :type strictness: Strictness :param phrase_max_len: The maximum length of the phrase in examples. Defaults to 80. :type phrase_max_len: int :Returns: **str** -- The report string .. py:method:: model_dump(**kwargs) Usage docs: https://docs.pydantic.dev/2.10/concepts/serialization/#modelmodel_dump Generate a dictionary representation of the model, optionally specifying which fields to include or exclude. :param mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. :param include: A set of fields to include in the output. :param exclude: A set of fields to exclude from the output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to use the field's alias in the dictionary key if defined. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :Returns: **A dictionary representation of the model.** .. py:exception:: MalformedFinding(*args) Bases: :py:obj:`ValueError` Inappropriate argument value (of correct type). .. py:method:: __init__(*args) Initialize self. See help(type(self)) for accurate signature.