:py:mod:`medcat.utils.regression.checking` ========================================== .. py:module:: medcat.utils.regression.checking Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.regression.checking.RegressionCase medcat.utils.regression.checking.MetaData medcat.utils.regression.checking.RegressionSuite Functions ~~~~~~~~~ .. autoapisummary:: medcat.utils.regression.checking.get_ontology_and_version medcat.utils.regression.checking.fix_np_float64 Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.regression.checking.logger medcat.utils.regression.checking.UNKNOWN_METADATA .. py:data:: logger .. py:class:: RegressionCase(/, **data) Bases: :py:obj:`pydantic.BaseModel` A regression case that has a name, defines options, filters and phrases. .. py:attribute:: name :type: str .. py:attribute:: options :type: medcat.utils.regression.targeting.OptionSet .. py:attribute:: phrases :type: List[str] .. py:attribute:: report :type: medcat.utils.regression.results.ResultDescriptor .. py:method:: check_specific_for_phrase(cat, target, translation) Checks whether the specific target along with the specified phrase is able to be identified using the specified model. :param cat: The model :type cat: CAT :param target: The final target configuration :type target: FinalTarget :param translation: The translation layer :type translation: TranslationLayer :raises MalformedRegressionCaseException: If there are too many placeholders in phrase. :Returns: **Tuple[Finding, Optional[str]]** -- The nature to which the target was (or wasn't) identified .. py:method:: estimate_num_of_diff_subcases() .. py:method:: get_distinct_cases(translation, edit_distance, use_diacritics) Gets the various distinct sub-case iterators. The sub-cases are those that can be determine without the translation layer. However, the translation layer is included here since it streamlines the operation. :param translation: The translation layer. :type translation: TranslationLayer :param edit_distance: The edit distance(s) to try. :type edit_distance: Tuple[int, int, int] :param use_diacritics: Whether to use diacritics for edit distance. :type use_diacritics: bool :Yields: *Iterator[Iterator[FinalTarget]]* -- The iterator of iterators of different sub cases. .. py:method:: _get_subcases(phrase, changer, translation, edit_distance, use_diacritics) .. py:method:: to_dict() Converts the RegressionCase to a dict for serialisation. :Returns: **dict** -- The dict representation .. py:method:: from_dict(name, in_dict) :classmethod: Construct the regression case from a dict. The expected structure: { 'targeting': { [ 'placeholder': '[DIAGNOSIS]' # the placeholder to be replaced 'cuis': ['cui1', 'cui2'] 'prefname-only': 'false', # optional ] }, 'phrases': ['phrase %s'] # possible multiple } :param name: The name of the case :type name: str :param in_dict: The dict describing the case :type in_dict: dict :raises ValueError: If the input dict does not have the 'targeting' section :raises ValueError: If there are no phrases defined :Returns: **RegressionCase** -- The constructed regression cases. .. py:method:: __hash__() Return hash(self). .. py:method:: __eq__(other) Return self==value. .. py:data:: UNKNOWN_METADATA :value: 'Unknown' .. py:function:: get_ontology_and_version(model_card) Attempt to get ontology (and its version) from a model card dict. If no ontology is found, 'Unknown' is returned. The version is always returned as the first source ontology. That is, unless the specified location does not exist in the model card, in which case 'Unknown' is returned. The ontology is assumed to be described at: model_card['Source Ontology'][0] (or model_card['Source Ontology'] if it's a string instead of a list) The ontology version is read from: model_card['Source Ontology'][0] (or model_card['Source Ontology'] if it's a string instead of a list) Currently, only SNOMED-CT, UMLS and ICD are supported / found. :param model_card: The input model card. :type model_card: dict :Returns: **Tuple[str, str]** -- The ontology (if found) or 'Unknown'; and the version (if found) or 'Unknown' .. py:class:: MetaData(/, **data) Bases: :py:obj:`pydantic.BaseModel` The metadata for the regression suite. This should define which ontology (e.g UMLS or SNOMED) as well as which version was used when generating the regression suite. The metadata may contain further information as well, this may include the annotator(s) involved when converting from MCT export or other relevant data. .. py:attribute:: ontology :type: str .. py:attribute:: ontology_version :type: str .. py:attribute:: extra :type: dict .. py:attribute:: regr_suite_creation_date :type: str .. py:method:: from_modelcard(model_card) :classmethod: Generate a MetaData object from a model card. This involves reading ontology info and version from the model card. It must be noted that the model card should be provided as a dict not a string. :param model_card: The CAT modelcard :type model_card: dict :Returns: **MetaData** -- The resulting MetaData .. py:method:: unknown() :classmethod: .. py:function:: fix_np_float64(d) Fix numpy.float64 in dictionary for yaml saving purposes. These types of objects are unable to be cleanly serialized using yaml. So we need to convert them to the corresponding floats. The changes will be made within the dictionary itself as well as dictionaries within, recursively. :param d: The input dict :type d: dict .. py:class:: RegressionSuite(cases, metadata, name) The regression checker. This is used to check a bunch of regression cases at once against a model. :param cases: The list of regression cases :type cases: List[RegressionCase] :param metadata: The metadata for the regression suite :type metadata: MetaData :param use_report: Whether or not to use the report functionality (defaults to False) :type use_report: bool .. py:method:: __init__(cases, metadata, name) .. py:method:: get_all_distinct_cases(translation, edit_distance, use_diacritics) Gets all the distinct cases for this regression suite. While distinct cases can be determined without the translation layer, including it here simplifies the process. :param translation: The translation layer. :type translation: TranslationLayer :param edit_distance: The edit distance(s) to try. Defaults to (0, 0, 0). :type edit_distance: Tuple[int, int, int] :param use_diacritics: Whether to use diacritics for edit distance. :type use_diacritics: bool :Yields: *Iterator[Tuple[RegressionCase, Iterator[FinalTarget]]]* -- The generator of the regression case along with its corresponding sub-cases. .. py:method:: estimate_total_distinct_cases() .. py:method:: iter_subcases(translation, show_progress = True, edit_distance = (0, 0, 0), use_diacritics = False) Iterate over all the sub-cases. Each sub-case present a unique target (phrase, concept, name) on the corresponding regression case. :param translation: The translation layer. :type translation: TranslationLayer :param show_progress: Whether to show progress. Defaults to True. :type show_progress: bool :param edit_distance: The edit distance(s) to try. Defaults to (0, 0, 0). :type edit_distance: Tuple[int, int, int] :param use_diacritics: Whether to use diacritics for edit distance. :type use_diacritics: bool :Yields: *Iterator[Tuple[RegressionCase, FinalTarget]]* -- The generator of the regression case along with each of the final target sub-cases. .. py:method:: check_model(cat, translation, edit_distance = (0, 0, 0), use_diacritics = False) Checks model and generates a report :param cat: The model to check against :type cat: CAT :param translation: The translation layer :type translation: TranslationLayer :param edit_distance: The edit distance of the names. Defaults to (0, 0, 0). :type edit_distance: Tuple[int, int, int] :param use_diacritics: Whether to use diacritics for edit distance. :type use_diacritics: bool :Returns: **MultiDescriptor** -- A report description .. py:method:: __str__() Return str(self). .. py:method:: __repr__() Return repr(self). .. py:method:: to_dict() Converts the RegressionChecker to dict for serialisation. :Returns: **dict** -- The dict representation .. py:method:: to_yaml() Convert the RegressionChecker to YAML string. :Returns: **str** -- The YAML representation .. py:method:: __eq__(other) Return self==value. .. py:method:: from_dict(in_dict, name) :classmethod: Construct a RegressionChecker from a dict. Most of the parsing is handled in RegressionChecker.from_dict. This just assumes that each key in the dict is a name and each value describes a RegressionCase. :param in_dict: The input dict. :type in_dict: dict :param name: The name of the regression suite. :type name: str :Returns: **RegressionChecker** -- The built regression checker .. py:method:: from_yaml(file_name) :classmethod: Constructs a RegressionChcker from a YAML file. The from_dict method is used for the construction from the dict. :param file_name: The file name :type file_name: str :Returns: **RegressionChecker** -- The constructed regression checker .. py:method:: from_mct_export(file_name) :classmethod: .. py:exception:: MalformedRegressionCaseException(*args) Bases: :py:obj:`ValueError` Inappropriate argument value (of correct type). .. py:method:: __init__(*args) Initialize self. See help(type(self)) for accurate signature.