:py:mod:`medcat.rel_cat`
========================

.. py:module:: medcat.rel_cat


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medcat.rel_cat.BalancedBatchSampler
   medcat.rel_cat.RelCAT


.. py:class:: BalancedBatchSampler(dataset, classes, batch_size, max_samples, max_minority)


   Bases: :py:obj:`torch.utils.data.Sampler`

   Base class for all Samplers.

   Every Sampler subclass has to provide an :meth:`__iter__` method, providing a
   way to iterate over indices or lists of indices (batches) of dataset elements,
   and may provide a :meth:`__len__` method that returns the length of the returned iterators.

   :param data_source: This argument is not used and will be removed in 2.2.0.
                       You may still have custom implementation that utilizes it.
   :type data_source: Dataset

   .. rubric:: Example

   >>> # xdoctest: +SKIP
   >>> class AccedingSequenceLengthSampler(Sampler[int]):
   >>>     def __init__(self, data: List[str]) -> None:
   >>>         self.data = data
   >>>
   >>>     def __len__(self) -> int:
   >>>         return len(self.data)
   >>>
   >>>     def __iter__(self) -> Iterator[int]:
   >>>         sizes = torch.tensor([len(x) for x in self.data])
   >>>         yield from torch.argsort(sizes).tolist()
   >>>
   >>> class AccedingSequenceLengthBatchSampler(Sampler[List[int]]):
   >>>     def __init__(self, data: List[str], batch_size: int) -> None:
   >>>         self.data = data
   >>>         self.batch_size = batch_size
   >>>
   >>>     def __len__(self) -> int:
   >>>         return (len(self.data) + self.batch_size - 1) // self.batch_size
   >>>
   >>>     def __iter__(self) -> Iterator[List[int]]:
   >>>         sizes = torch.tensor([len(x) for x in self.data])
   >>>         for batch in torch.chunk(torch.argsort(sizes), len(self)):
   >>>             yield batch.tolist()

   .. note:: The :meth:`__len__` method isn't strictly required by
             :class:`~torch.utils.data.DataLoader`, but is expected in any
             calculation involving the length of a :class:`~torch.utils.data.DataLoader`.

   .. py:method:: __init__(dataset, classes, batch_size, max_samples, max_minority)


   .. py:method:: __len__()


   .. py:method:: __iter__()


.. py:class:: RelCAT(cdb, config = ConfigRelCAT(), task='train', init_model=False)


   Bases: :py:obj:`medcat.pipeline.pipe_runner.PipeRunner`

   The RelCAT class used for training 'Relation-Annotation' models, i.e., annotation of relations
    between clinical concepts.

   :param cdb: cdb, this is used when creating relation datasets.
   :type cdb: CDB
   :param tokenizer: The Huggingface tokenizer instance. This can be a pre-trained tokenzier instance from
                     a BERT-style model. For now, only BERT models are supported.
   :type tokenizer: TokenizerWrapperBERT
   :param config: the configuration for RelCAT. Param descriptions available in ConfigRelCAT class docs.
   :type config: ConfigRelCAT
   :param task: What task is this model supposed to handle. Defaults to "train"
   :type task: str, optional
   :param init_model: loads default model. Defaults to False.
   :type init_model: bool, optional

   .. py:attribute:: name
      :value: 'rel_cat'

      
   .. py:attribute:: log

      
   .. py:method:: __init__(cdb, config = ConfigRelCAT(), task='train', init_model=False)


   .. py:method:: save(save_path = './')


   .. py:method:: load(load_path = './')
      :classmethod:


   .. py:method:: __call__(doc)


   .. py:method:: _create_test_train_datasets(data, split_sets = False)


   .. py:method:: train(export_data_path = '', train_csv_path = '', test_csv_path = '', checkpoint_path = './')


   .. py:method:: evaluate_(output_logits, labels, ignore_idx)


   .. py:method:: evaluate_results(data_loader, pad_id)


   .. py:method:: pipe(stream, *args, **kwargs)


   .. py:method:: predict_text_with_anns(text, annotations)

      Creates spacy doc from text and annotation input. Predicts using self.__call__

      :param text: text
      :type text: str
      :param annotations: dict containing the entities from NER (of your choosing), the format
                          must be the following format:
                                  [
                                      {
                                          "cui": "202099003", -this is optional
                                          "value": "discoid lateral meniscus",
                                          "start": 294,
                                          "end": 318
                                      },
                                      {
                                          "cui": "202099003",
                                          "value": "Discoid lateral meniscus",
                                          "start": 1905,
                                          "end": 1929,
                                      }
                                  ]
      :type annotations: Dict

      :Returns: **Doc** -- spacy doc with the relations.