:py:mod:`medcat.utils.ner.deid`
===============================

.. py:module:: medcat.utils.ner.deid

.. autoapi-nested-parse::

   De-identification model.

   This describes a wrapper on the regular CAT model.
   The idea is to simplify the use of a DeId-specific model.

   It tackles two use cases
   1) Creation of a deid model
   2) Loading and use of a deid model

   I.e for use case 1:

   Instead of:
   cat = CAT(cdb=ner.cdb, addl_ner=ner)

   You can use:
   deid = DeIdModel.create(ner)


   And for use case 2:

   Instead of:
   cat = CAT.load_model_pack(model_pack_path)
   anon_text = deid_text(cat, text)

   You can use:
   deid = DeIdModel.load_model_pack(model_pack_path)
   anon_text = deid.deid_text(text)

   Or if/when structured output is desired:
   deid = DeIdModel.load_model_pack(model_pack_path)
   anon_doc = deid(text)  # the spacy document

   The wrapper also exposes some CAT parts directly:
   - config
   - cdb


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medcat.utils.ner.deid.DeIdModel


Functions
~~~~~~~~~

.. autoapisummary::

   medcat.utils.ner.deid.match_rules
   medcat.utils.ner.deid.merge_all_preds
   medcat.utils.ner.deid.merge_preds


Attributes
~~~~~~~~~~

.. autoapisummary::

   medcat.utils.ner.deid.logger


.. py:data:: logger

   
.. py:class:: DeIdModel(cat)


   Bases: :py:obj:`medcat.utils.ner.model.NerModel`

   The DeID model.

   This wraps a CAT instance and simplifies its use as a
   de-identification model.

   It provides methods for creating one from a TransformersNER
   as well as loading from a model pack (along with some validation).

   It also exposes some useful parts of the CAT it wraps such as
   the config and the concept database.

   .. py:method:: __init__(cat)


   .. py:method:: train(json_path = None, *args, **kwargs)

      Train the underlying transformers NER model.

      All the extra arguments are passed to the TransformersNER train method.

      :param json_path: The JSON file path to read the training data from.
      :type json_path: Union[str, list, None]
      :param train_nr: The number of the NER object in cat._addl_train to train. Defaults to 0.
      :type train_nr: int
      :param \*args: Additional arguments for TransformersNER.train .
      :param \*\*kwargs: Additional keyword arguments for TransformersNER.train .

      :Returns: **Tuple[Any, Any, Any]** -- df, examples, dataset


   .. py:method:: eval(json_path, *args, **kwargs)

      Evaluate the underlying transformers NER model.

      All the extra arguments are passed to the TransformersNER eval method.

      :param json_path: The JSON file path to read the training data from.
      :type json_path: Union[str, list, None]
      :param train_nr: The number of the NER object in cat._addl_train to train. Defaults to 0.
      :type train_nr: int
      :param \*args: Additional arguments for TransformersNER.eval .
      :param \*\*kwargs: Additional keyword arguments for TransformersNER.eval .

      :Returns: **Tuple[Any, Any, Any]** -- df, examples, dataset


   .. py:method:: deid_text(text, redact = False)

      Deidentify text and potentially redact information.

      De-identified text.
      If redaction is enabled, identifiable entities will be
      replaced with starts (e.g `*****`).
      Otherwise, the replacement will be the CUI or in other words,
      the type of information that was hidden (e.g [PATIENT]).

      :param text: The text to deidentify.
      :type text: str
      :param redact: Whether to redact the information.
      :type redact: bool

      :Returns: **str** -- The deidentified text.


   .. py:method:: deid_multi_texts(texts, redact = False, addl_info = ['cui2icd10', 'cui2ontologies', 'cui2snomed'], n_process = None, batch_size = None)

      Deidentify text on multiple branches

      :param texts: Text to be annotated
      :type texts: Union[Iterable[str], Iterable[Tuple]]
      :param redact: Whether to redact the information.
      :type redact: bool
      :param addl_info: Additional info. Defaults to ['cui2icd10', 'cui2ontologies', 'cui2snomed'].
      :type addl_info: List[str], optional
      :param n_process: Number of processes. Defaults to None.
      :type n_process: Optional[int], optional
      :param batch_size: The size of a batch. Defaults to None.
      :type batch_size: Optional[int], optional

      :raises ValueError: In case of unsupported input.

      :Returns: **List[str]** -- List of deidentified documents.


   .. py:method:: load_model_pack(model_pack_path, config = None)
      :classmethod:

      Load DeId model from model pack.

      The method first loads the CAT instance.

      It then makes sure that the model pack corresponds to a
      valid DeId model.

      :param config: Config for DeId model pack (primarily for stride of overlap window)
      :param model_pack_path: The model pack path.
      :type model_pack_path: str

      :raises ValueError: If the model pack does not correspond to a DeId model.

      :Returns: **DeIdModel** -- The resulting DeI model.


   .. py:method:: _is_deid_model(cat)
      :classmethod:


   .. py:method:: _get_reason_not_deid(cat)
      :classmethod:


.. py:function:: match_rules(rules, texts, cui2preferred_name)

   Match a set of rules - pat / cui combos as post processing labels.

   Uses a cat DeID model for pretty name mapping.

   :param rules: List of tuples of pattern and cui
   :type rules: List[Tuple[str, str]]
   :param texts: List of texts to match rules on
   :type texts: List[str]
   :param cui2preferred_name: Dictionary of CUI to preferred name, likely to be cat.cdb.cui2preferred_name.
   :type cui2preferred_name: Dict[str, str]

   .. rubric:: Examples

   >>> cat = CAT.load_model_pack(model_pack_path)
   ...
   >>> rules = [
       ('(123) 456-7890', '134'),
       ('1234567890', '134'),
       ('123.456.7890', '134'),
       ('1234567890', '134'),
       ('1234567890', '134'),
   ]
   >>> texts = [
       'My phone number is (123) 456-7890',
       'My phone number is 1234567890',
       'My phone number is 123.456.7890',
       'My phone number is 1234567890',
   ]
   >>> matches = match_rules(rules, texts, cat.cdb.cui2preferred_name)

   :Returns: **List[List[Dict]]** -- List of lists of predictions from `match_rules`


.. py:function:: merge_all_preds(model_preds_by_text, rule_matches_per_text, accept_preds = True)

   Conveniance method to merge predictions from rule based and deID model predictions.

   :param model_preds_by_text: list of predictions from
                               `cat.get_entities()`, then `[list(m['entities'].values()) for m in model_preds]`
   :type model_preds_by_text: List[Dict]
   :param rule_matches_per_text: list of predictions from output of
                                 running `match_rules`
   :type rule_matches_per_text: List[Dict]
   :param accept_preds: uses the predicted label from the model,
                        model_preds_by_text, over the rule matches if they overlap.
                        Defaults to using model preds over rules.
   :type accept_preds: bool

   :Returns: **List[List[Dict]]** -- List of lists of predictions from `merge_all_preds`


.. py:function:: merge_preds(model_preds, rule_matches, accept_preds = True)

   Merge predictions from rule based and deID model predictions.

   :param model_preds: predictions from `cat.get_entities()`
   :type model_preds: List[Dict]
   :param rule_matches: predictions from output of running `match_rules` on a text
   :type rule_matches: List[Dict]
   :param accept_preds: uses the predicted label from the model,
                        model_preds, over the rule matches if they overlap.
                        Defaults to using model preds over rules.
   :type accept_preds: bool

   .. rubric:: Examples

   >>> # a list of predictions from `cat.get_entities()`
   >>> model_preds = [
       [
           {'cui': '134', 'start': 10, 'end': 20, 'acc': 1.0,
            'pretty_name': 'Phone Number'},
           {'cui': '134', 'start': 25, 'end': 35, 'acc': 1.0,
            'pretty_name': 'Phone Number'}
       ]
   ]
   >>> # a list of predictions from `match_rules`
   >>> rule_matches = [
       [
           {'cui': '134', 'start': 10, 'end': 20, 'acc': 1.0,
            'pretty_name': 'Phone Number'},
           {'cui': '134', 'start': 25, 'end': 35, 'acc': 1.0,
            'pretty_name': 'Phone Number'}
       ]
   ]
   >>> merged_preds = merge_preds(model_preds, rule_matches)

   :Returns: **List[Dict]** -- List of predictions from `merge_preds`