:py:mod:`medcat.config_rel_cat`
===============================

.. py:module:: medcat.config_rel_cat


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medcat.config_rel_cat.General
   medcat.config_rel_cat.Model
   medcat.config_rel_cat.Train
   medcat.config_rel_cat.ConfigRelCAT


.. py:class:: General(/, **data)


   Bases: :py:obj:`medcat.config.MixingConfig`, :py:obj:`medcat.config.BaseModel`

   The General part of the RelCAT config

   .. py:attribute:: device
      :type: str
      :value: 'cpu'

      The device to use (CPU or GPU).

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: relation_type_filter_pairs
      :type: List
      :value: []

      Map from category values to ID, if empty it will be autocalculated during training

   .. py:attribute:: vocab_size
      :type: medcat.config.Optional[int]

      
   .. py:attribute:: lowercase
      :type: bool
      :value: True

      If true all input text will be lowercased

   .. py:attribute:: cntx_left
      :type: int
      :value: 15

      Number of tokens to take from the left of the concept

   .. py:attribute:: cntx_right
      :type: int
      :value: 15

      Number of tokens to take from the right of the concept

   .. py:attribute:: window_size
      :type: int
      :value: 300

      Max acceptable dinstance between entities (in characters), care when using this as it can produce sentences that are over 512 tokens (limit is given by tokenizer)

   .. py:attribute:: limit_samples_per_class
      :type: int

      Number of samples per class, this limit is applied for train samples, so if train samples are 100 then test would be 20.

   .. py:attribute:: addl_rels_max_sample_size
      :type: int
      :value: 200

      Limit the number of 'Other' samples selected for training/test. This is applied per encountered medcat project, sample_size/num_projects.

   .. py:attribute:: create_addl_rels
      :type: bool
      :value: False

      When processing relations from a MedCAT export/docs, relations labeled as 'Other' are created from all the annotations pairs available

   .. py:attribute:: create_addl_rels_by_type
      :type: bool
      :value: False

      When creating the 'Other' relation class, actually split this class into subclasses based on concept types

   .. py:attribute:: tokenizer_name
      :type: str
      :value: 'bert'

      The name of the tokenizer user.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: model_name
      :type: str
      :value: 'bert-base-uncased'

      The name of the model used.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: log_level
      :type: int

      The log level for RelCAT.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: max_seq_length
      :type: int
      :value: 512

      The maximum sequence length.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: tokenizer_special_tokens
      :type: bool
      :value: False

      Tokenizer.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: annotation_schema_tag_ids
      :type: List
      :value: [30522, 30523, 30524, 30525]

      If a foreign non-MCAT trainer dataset is used, you can insert your own Rel entity token delimiters into the tokenizer,     copy those token IDs here, and also resize your tokenizer embeddings and adjust the hidden_size of the model, this will depend on the number of tokens you introduce
      for example: 30522 - [s1], 30523 - [e1], 30524 - [s2], 30525 - [e2], 30526 - [BLANK], 30527 - [ENT1], 30528 - [ENT2], 30529 - [/ENT1], 30530 - [/ENT2]
      Please note that the tokenizer special tokens are supposed to be in pairs of two for example [s1] and [e1], [s2] and [e2], the [BLANK] is just an example placeholder token
      If you have more than four tokens here then you need to make sure they are present in the text,
      otherwise the pipeline will throw an error in the get_annotation_schema_tag() function.

   .. py:attribute:: tokenizer_relation_annotation_special_tokens_tags
      :type: List[str]
      :value: ['[s1]', '[e1]', '[s2]', '[e2]']

      
   .. py:attribute:: tokenizer_other_special_tokens
      :type: Dict[str, str]

      The special tokens used by the tokenizer. The {PAD} is for Lllama tokenizer.

   .. py:attribute:: labels2idx
      :type: Dict[str, int]

      
   .. py:attribute:: idx2labels
      :type: Dict[int, str]

      
   .. py:attribute:: pin_memory
      :type: bool
      :value: True

      If True the data loader will copy the tensors to the GPU pinned memory

   .. py:attribute:: seed
      :type: int
      :value: 13

      The seed for random number generation.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: task
      :type: str
      :value: 'train'

      The task for RelCAT.

   .. py:attribute:: language
      :type: str
      :value: 'en'

      Used for Spacy lang setting

   .. py:method:: convert_keys_to_int(value)
      :classmethod:


   .. py:method:: __setattr__(key, value)

      Implement setattr(self, name, value).


.. py:class:: Model(/, **data)


   Bases: :py:obj:`medcat.config.MixingConfig`, :py:obj:`medcat.config.BaseModel`

   The model part of the RelCAT config

   .. py:class:: Config


      .. py:attribute:: extra
         :value: 'allow'

         
      .. py:attribute:: validate_assignment
         :value: True

         
   .. py:attribute:: input_size
      :type: int
      :value: 300

      
   .. py:attribute:: hidden_size
      :type: int
      :value: 768

      The hidden size.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: hidden_layers
      :type: int
      :value: 3

      hidden_size * 5, 5 being the number of tokens, default (s1,s2,e1,e2+CLS).

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: model_size
      :type: int
      :value: 5120

      The size of the model.

      NB! For these changes to take effect, the pipe would need to be recreated.

   .. py:attribute:: dropout
      :type: float
      :value: 0.2

      
   .. py:attribute:: num_directions
      :type: int
      :value: 2

      2 - bidirectional model, 1 - unidirectional

   .. py:attribute:: freeze_layers
      :type: bool
      :value: True

      If we update the weights during training

   .. py:attribute:: padding_idx
      :type: int

      
   .. py:attribute:: emb_grad
      :type: bool
      :value: True

      If True the embeddings will also be trained

   .. py:attribute:: ignore_cpos
      :type: bool
      :value: False

      If set to True center positions will be ignored when calculating representation

   .. py:attribute:: llama_use_pooled_output
      :type: bool
      :value: False

      If set to True, used only in Llama model, it will add the extra tensor formed from selecting the max of the last hidden layer


.. py:class:: Train(/, **data)


   Bases: :py:obj:`medcat.config.MixingConfig`, :py:obj:`medcat.config.BaseModel`

   The train part of the RelCAT config

   .. py:class:: Config


      .. py:attribute:: extra
         :value: 'allow'

         
      .. py:attribute:: validate_assignment
         :value: True

         
   .. py:attribute:: nclasses
      :type: int
      :value: 2

      Number of classes that this model will output

   .. py:attribute:: batch_size
      :type: int
      :value: 25

      batch size

   .. py:attribute:: nepochs
      :type: int
      :value: 1

      Epochs

   .. py:attribute:: lr
      :type: float
      :value: 0.0001

      Learning rate

   .. py:attribute:: stratified_batching
      :type: bool
      :value: False

      Train the model with stratified batching

   .. py:attribute:: batching_samples_per_class
      :type: list
      :value: []

      Number of samples per class in each batch
      example for batch size 64: [6,6,6,8,8,8,6,8,8]

   .. py:attribute:: batching_minority_limit
      :type: Union[List[int], int]
      :value: 0

      Maximum number of samples the minority class can have.
      Since the minority class elements need to be repeated, this is used to facilitate that
      example: batching_samples_per_class - [6,6,6,8,8,8,6,8,8]
               batching_minority_limit - 6

   .. py:attribute:: adam_betas
      :type: Tuple[float, float]
      :value: (0.9, 0.999)

      
   .. py:attribute:: adam_weight_decay
      :type: float
      :value: 0

      
   .. py:attribute:: adam_epsilon
      :type: float
      :value: 1e-08

      
   .. py:attribute:: test_size
      :type: float
      :value: 0.2

      
   .. py:attribute:: gradient_acc_steps
      :type: int
      :value: 1

      
   .. py:attribute:: multistep_milestones
      :type: List[int]
      :value: [2, 4, 6, 8, 12, 15, 18, 20, 22, 24, 26, 30]

      
   .. py:attribute:: multistep_lr_gamma
      :type: float
      :value: 0.8

      
   .. py:attribute:: max_grad_norm
      :type: float
      :value: 1.0

      
   .. py:attribute:: shuffle_data
      :type: bool
      :value: True

      Used only during training, if set the dataset will be shuffled before train/test split

   .. py:attribute:: class_weights
      :type: Union[List[float], None]

      
   .. py:attribute:: enable_class_weights
      :type: bool
      :value: False

      
   .. py:attribute:: score_average
      :type: str
      :value: 'weighted'

      What to use for averaging F1/P/R across labels

   .. py:attribute:: auto_save_model
      :type: bool
      :value: True

      Should the model be saved during training for best results


.. py:class:: ConfigRelCAT(/, **data)


   Bases: :py:obj:`medcat.config.MixingConfig`, :py:obj:`medcat.config.BaseModel`

   The RelCAT part of the config

   .. py:class:: Config


      .. py:attribute:: extra
         :value: 'allow'

         
      .. py:attribute:: validate_assignment
         :value: True

         
   .. py:attribute:: general
      :type: General

      
   .. py:attribute:: model
      :type: Model

      
   .. py:attribute:: train
      :type: Train

      
   .. py:method:: load(load_path = './')
      :classmethod:

      Load the config from a file.

      :param load_path: Path to RelCAT config. Defaults to "./".
      :type load_path: str

      :Returns: **ConfigRelCAT** -- The loaded config.