:py:mod:`medcat.meta_cat`
=========================

.. py:module:: medcat.meta_cat


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medcat.meta_cat.MetaCAT


Attributes
~~~~~~~~~~

.. autoapisummary::

   medcat.meta_cat.logger


.. py:data:: logger

   
.. py:class:: MetaCAT(tokenizer = None, embeddings = None, config = None)


   Bases: :py:obj:`medcat.pipeline.pipe_runner.PipeRunner`

   The MetaCAT class used for training 'Meta-Annotation' models, i.e. annotations of clinical
   concept annotations. These are also known as properties or attributes of recognise entities
   in similar tools such as MetaMap and cTakes.

   This is a flexible model agnostic class that can learns any meta-annotation task, i.e. any
   multi-class classification task for recognised terms.

   :param tokenizer: The Huggingface tokenizer instance. This can be a pre-trained tokenzier instance from
                     a BERT-style model, or trained from scratch for the Bi-LSTM (w. attention) model that
                     is currently used in most deployments.
   :type tokenizer: TokenizerWrapperBase
   :param embeddings: embedding mapping (sub)word input id n-dim (sub)word embedding.
   :type embeddings: Tensor, numpy.ndarray
   :param config: the configuration for MetaCAT. Param descriptions available in ConfigMetaCAT docs.
   :type config: ConfigMetaCAT

   .. py:attribute:: name
      :value: 'meta_cat'

      
   .. py:attribute:: _component_lock

      
   .. py:method:: __init__(tokenizer = None, embeddings = None, config = None)


   .. py:method:: get_model(embeddings)

      Get the model

      :param embeddings: The embedding densor
      :type embeddings: Optional[Tensor]

      :raises ValueError: If the meta model is not LSTM or BERT

      :Returns: **nn.Module** -- The module


   .. py:method:: get_hash()

      A partial hash trying to catch differences between models.

      :Returns: **str** -- The hex hash.


   .. py:method:: train_from_json(json_path, save_dir_path = None, data_oversampled = None)

      Train or continue training a model give a json_path containing a MedCATtrainer export. It will
      continue training if an existing model is loaded or start new training if the model is blank/new.

      :param json_path: Path/Paths to a MedCATtrainer export containing the meta_annotations we want to train for.
      :type json_path: Union[str, list]
      :param save_dir_path: In case we have aut_save_model (meaning during the training the best model will be saved)
                            we need to set a save path. Defaults to `None`.
      :type save_dir_path: Optional[str]
      :param data_oversampled: In case of oversampling being performed, the data will be passed in the parameter
      :type data_oversampled: Optional[list]

      :Returns: **Dict** -- The resulting report.


   .. py:method:: train_raw(data_loaded, save_dir_path = None, data_oversampled = None)

      Train or continue training a model given raw data. It will
      continue training if an existing model is loaded or start new training if the model is blank/new.

      The raw data is expected in the following format:
      {'projects':
          [ # list of projects
              { # project 1
                  'name': '<some name>',
                  # list of documents
                  'documents': [{'name': '<some name>',  # document 1
                                  'text': '<text of the document>',
                                  # list of annotations
                                  'annotations': [{'start': -1,  # annotation 1
                                                  'end': 1,
                                                  'cui': 'cui',
                                                  'value': '<text value>'}, ...],
                                  }, ...]
              }, ...
          ]
      }

      :param data_loaded: The raw data we want to train for.
      :type data_loaded: Dict
      :param save_dir_path: In case we have aut_save_model (meaning during the training the best model will be saved)
                            we need to set a save path. Defaults to `None`.
      :type save_dir_path: Optional[str]
      :param data_oversampled: In case of oversampling being performed, the data will be passed in the parameter
                               The format of which is expected: [[['text','of','the','document'], [index of medical entity], "label" ],
                               ['text','of','the','document'], [index of medical entity], "label" ]]
      :type data_oversampled: Optional[list]

      :Returns: **Dict** -- The resulting report.

      :raises Exception: If no save path is specified, or category name not in data.
      :raises AssertionError: If no tokeniser is set
      :raises FileNotFoundError: If phase_number is set to 2 and model.dat file is not found
      :raises KeyError: If phase_number is set to 2 and model.dat file contains mismatched architecture


   .. py:method:: eval(json_path)

      Evaluate from json.

      :param json_path: The json file ath
      :type json_path: str

      :Returns: **Dict** -- The resulting model dict

      :raises AssertionError: If self.tokenizer
      :raises Exception: If the category name does not exist


   .. py:method:: save(save_dir_path)

      Save all components of this class to a file

      :param save_dir_path: Path to the directory where everything will be saved.
      :type save_dir_path: str

      :raises AssertionError: If self.tokenizer is None


   .. py:method:: load(save_dir_path, config_dict = None)
      :classmethod:

      Load a meta_cat object.

      :param save_dir_path: The directory where all was saved.
      :type save_dir_path: str
      :param config_dict: This can be used to overwrite saved parameters for this meta_cat
                          instance. Why? It is needed in certain cases where we autodeploy stuff. (Default value = None)
      :type config_dict: Optional[Dict], optional

      :Returns: **MetaCAT** -- The MetaCAT instance


   .. py:method:: get_ents(doc)


   .. py:method:: prepare_document(doc, input_ids, offset_mapping, lowercase)

      Prepares document.

      :param doc: The document
      :type doc: Doc
      :param input_ids: Input ids
      :type input_ids: List
      :param offset_mapping: Offset mapings
      :type offset_mapping: List
      :param lowercase: Whether to use lower case replace center
      :type lowercase: bool

      :Returns: * **Dict** -- Entity id to index mapping
                * **List** -- Samples


   .. py:method:: batch_generator(stream, batch_size_chars)
      :staticmethod:

      Generator for batch of documents.

      :param stream: The document stream
      :type stream: Iterable[Doc]
      :param batch_size_chars: Number of characters per batch
      :type batch_size_chars: int

      :Yields: *List[Doc]* -- The batch of documents.


   .. py:method:: pipe(stream, *args, **kwargs)

      Process many documents at once.

      :param stream: List of spacy documents.
      :type stream: Iterable[Union[Doc, FakeDoc]]
      :param \*args: Unused arguments (due to override)
      :param \*\*kwargs: Unused keyword arguments (due to override)

      :Yields: *Doc* -- The document.

      :Returns: **Iterator[Doc]** -- stream is None or empty.


   .. py:method:: _set_meta_anns(stream, batch_size_chars, config, id2category_value)


   .. py:method:: __call__(doc)

      Process one document, used in the spacy pipeline for sequential
      document processing.

      :param doc: A spacy document
      :type doc: Doc

      :Returns: **Doc** -- The same spacy document.


   .. py:method:: get_model_card(as_dict = False)

      A minimal model card.

      :param as_dict: Return the model card as a dictionary instead of a str. (Default value = False)
      :type as_dict: bool

      :Returns: **Union[str, dict]** -- An indented JSON object.
                OR A JSON object in dict form.


   .. py:method:: __repr__()

      Prints the model_card for this MetaCAT instance.

      :Returns: **the 'Model Card' for this MetaCAT instance. This includes NER+L config and any MetaCATs**