:py:mod:`medcat.config_meta_cat`
================================

.. py:module:: medcat.config_meta_cat


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medcat.config_meta_cat.General
   medcat.config_meta_cat.Model
   medcat.config_meta_cat.Train
   medcat.config_meta_cat.ConfigMetaCAT


Attributes
~~~~~~~~~~

.. autoapisummary::

   medcat.config_meta_cat.logger


.. py:data:: logger

   
.. py:class:: General(/, **data)


   Bases: :py:obj:`medcat.config.MixingConfig`, :py:obj:`medcat.config.BaseModel`

   The General part of the MetaCAT config

   .. py:class:: Config


      .. py:attribute:: extra
         :value: 'allow'

         
      .. py:attribute:: validate_assignment
         :value: True

         
   .. py:attribute:: device
      :type: str
      :value: 'cpu'

      Device to used by the module to perform predicting/training.

      Reference: https://pytorch.org/docs/stable/tensor_attributes.html#torch.device

   .. py:attribute:: disable_component_lock
      :type: bool
      :value: False

      Whether to use the MetaCAT component lock.

      If set to False (the default), a component lock is used that forces usage only on one thread at a time.

      If set to True, the component lock is not used.

   .. py:attribute:: seed
      :type: int
      :value: 13

      The seed for random number generation.

      NOTE: If used along RelCAT or additional NER, only one of the seeds will take effect
      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: description
      :type: str
      :value: 'No description'

      Should provide a basic description of this MetaCAT model

   .. py:attribute:: category_name
      :type: medcat.config.Optional[str]

      What category is this meta_cat model predicting/training.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: alternative_category_names
      :type: List
      :value: []

      List that stores the variations of possible category names
      Example: For Experiencer, the alternate name is Subject
      alternative_category_names: ['Experiencer','Subject']

      In the case that one specified in self.general.category_name parameter does not match the data, this ensures no error is raised and it is automatically mapped

   .. py:attribute:: category_value2id
      :type: Dict

      Map from category values to ID, if empty it will be autocalculated during training

   .. py:attribute:: alternative_class_names
      :type: List[List]
      :value: [[]]

      List of lists that stores the variations of possible class names for each class mentioned in self.general.category_value2id

      Example: For Presence task, the class names vary across NHS sites.
      To accommodate for this, alternative_class_names is populated as: [["Hypothetical (N/A)","Hypothetical"],["Not present (False)","False"],["Present (True)","True"]]
      Each sub list contains the possible variations of the given class.

   .. py:attribute:: vocab_size
      :type: medcat.config.Optional[int]

      Will be set automatically if the tokenizer is provided during meta_cat init

   .. py:attribute:: lowercase
      :type: bool
      :value: True

      If true all input text will be lowercased

   .. py:attribute:: cntx_left
      :type: int
      :value: 15

      Number of tokens to take from the left of the concept

   .. py:attribute:: cntx_right
      :type: int
      :value: 10

      Number of tokens to take from the right of the concept

   .. py:attribute:: replace_center
      :type: medcat.config.Optional[Any]

      If set the center (concept) will be replaced with this string

   .. py:attribute:: batch_size_eval
      :type: int
      :value: 5000

      Number of annotations to be meta-annotated at once in eval

   .. py:attribute:: annotate_overlapping
      :type: bool
      :value: False

      If set meta_anns will be calculated for doc._.ents, otherwise for doc.ents

   .. py:attribute:: tokenizer_name
      :type: str
      :value: 'bbpe'

      Tokenizer name used with MetaCAT.

      Choose from:
          - 'bbpe': Byte Pair Encoding Tokenizer
          - 'bert-tokenizer': BERT Tokenizer

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: save_and_reuse_tokens
      :type: bool
      :value: False

      This is a dangerous option, if not sure ALWAYS set to False. If set, it will try to share the pre-calculated
      context tokens between MetaCAT models when serving. It will ignore differences in tokenizer and context size,
      so you need to be sure that the models for which this is turned on have the same tokenizer and context size, during
      a deployment.

   .. py:attribute:: pipe_batch_size_in_chars
      :type: int
      :value: 20000000

      How many characters are piped at once into the meta_cat class

   .. py:attribute:: span_group
      :type: medcat.config.Optional[str]

      If set, the spacy span group that the metacat model will assign annotations.
      Otherwise defaults to doc._.ents or doc.ents per the annotate_overlapping settings

   .. py:method:: get_applicable_category_name(available_names)


.. py:class:: Model(/, **data)


   Bases: :py:obj:`medcat.config.MixingConfig`, :py:obj:`medcat.config.BaseModel`

   The model part of the metaCAT config

   .. py:class:: Config


      .. py:attribute:: extra
         :value: 'allow'

         
      .. py:attribute:: validate_assignment
         :value: True

         
   .. py:attribute:: model_name
      :type: str
      :value: 'lstm'

      Model to be used for training or predicting.

      Choose from:
          - 'bert'
          - 'lstm'

      .. note::

         When changing the model, make sure to change the tokenizer accordingly.
         NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: model_variant
      :type: str
      :value: 'bert-base-uncased'

      Applicable only when using BERT:

      Specifies the model variant to be used.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: model_freeze_layers
      :type: bool
      :value: True

      Applicable only when using BERT:

      Determines the training approach for BERT.

      - If True: BERT layers are frozen and only the fully connected (FC) layer(s) on top are trained.
      - If False: Parameter-efficient fine-tuning will be applied using Low-Rank Adaptation (LoRA).

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: num_layers
      :type: int
      :value: 2

      Number of layers in the model (both LSTM and BERT)

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: input_size
      :type: int
      :value: 300

      Specifies the size of the embedding layer.

      Applicable only for LSTM model and ignored for BERT as BERT's embedding size is predefined.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: hidden_size
      :type: int
      :value: 300

      Number of neurons in the hidden layer.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: dropout
      :type: float
      :value: 0.5

      The dropout for the model.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: phase_number
      :type: int
      :value: 0

      Indicates whether two phase learning is to be used for training.

      1: Phase 1 - Train model on undersampled data

      2: Phase 2 - Continue training on full data

      0: None - 2 phase learning is not performed

      Paper reference - https://ieeexplore.ieee.org/document/7533053

   .. py:attribute:: category_undersample
      :type: str
      :value: ''

      When using 2 phase learning, this category is used to undersample the data

   .. py:attribute:: model_architecture_config
      :type: Dict

      Specifies the architecture for BERT model.

      If fc2 is set to True, then the 2nd fully connected layer is used

      If fc2 is True and fc3 is set to True, then the 3rd fully connected layer is used

      If lr_scheduler is set to True, then the learning rate scheduler is used with the optimizer

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: num_directions
      :type: int
      :value: 2

      Applicable only for LSTM:

      2 - bidirectional model, 1 - unidirectional

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: nclasses
      :type: int
      :value: 2

      Number of classes that this model will output.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: padding_idx
      :type: int

      The padding index.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: emb_grad
      :type: bool
      :value: True

      Applicable only for LSTM:

      If True, the embeddings will also be trained.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: ignore_cpos
      :type: bool
      :value: False

      If set to True center positions will be ignored when calculating representation


.. py:class:: Train(/, **data)


   Bases: :py:obj:`medcat.config.MixingConfig`, :py:obj:`medcat.config.BaseModel`

   The train part of the metaCAT config

   .. py:class:: Config


      .. py:attribute:: extra
         :value: 'allow'

         
      .. py:attribute:: validate_assignment
         :value: True

         
   .. py:attribute:: batch_size
      :type: int
      :value: 100

      
   .. py:attribute:: nepochs
      :type: int
      :value: 50

      
   .. py:attribute:: lr
      :type: float
      :value: 0.001

      
   .. py:attribute:: test_size
      :type: float
      :value: 0.1

      
   .. py:attribute:: shuffle_data
      :type: bool
      :value: True

      Used only during training, if set the dataset will be shuffled before train/test split

   .. py:attribute:: class_weights
      :type: medcat.config.Optional[Any]

      
   .. py:attribute:: compute_class_weights
      :type: bool
      :value: False

      If true and class weights not provided, the class weights will be calculated based on the data

   .. py:attribute:: score_average
      :type: str
      :value: 'weighted'

      What to use for averaging F1/P/R across labels

   .. py:attribute:: prerequisites
      :type: dict

      
   .. py:attribute:: cui_filter
      :type: medcat.config.Optional[Any]

      If set only this CUIs will be used for training

   .. py:attribute:: auto_save_model
      :type: bool
      :value: True

      Should do model be saved during training for best results

   .. py:attribute:: last_train_on
      :type: medcat.config.Optional[float]

      When was the last training run

   .. py:attribute:: metric
      :type: Dict[str, str]

      What metric should be used for choosing the best model

   .. py:attribute:: loss_funct
      :type: str
      :value: 'cross_entropy'

      Loss function for the model.

      Choose from:
          - 'cross_entropy'
          - 'focal_loss'

   .. py:attribute:: gamma
      :type: int
      :value: 2

      Focal Loss hyperparameter - determines importance the loss gives to hard-to-classify examples


.. py:class:: ConfigMetaCAT(/, **data)


   Bases: :py:obj:`medcat.config.MixingConfig`, :py:obj:`medcat.config.BaseModel`

   The MetaCAT part of the config

   .. py:class:: Config


      .. py:attribute:: extra
         :value: 'allow'

         
      .. py:attribute:: validate_assignment
         :value: True

         
   .. py:attribute:: general
      :type: General

      
   .. py:attribute:: model
      :type: Model

      
   .. py:attribute:: train
      :type: Train