:py:mod:`medcat.utils.relation_extraction.tokenizer`
====================================================

.. py:module:: medcat.utils.relation_extraction.tokenizer


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medcat.utils.relation_extraction.tokenizer.BaseTokenizerWrapper_RelationExtraction


Attributes
~~~~~~~~~~

.. autoapisummary::

   medcat.utils.relation_extraction.tokenizer.logger


.. py:data:: logger

   
.. py:class:: BaseTokenizerWrapper_RelationExtraction(hf_tokenizers=None, max_seq_length = None, add_special_tokens = False)


   Bases: :py:obj:`transformers.PreTrainedTokenizerFast`

   Base class for all fast tokenizers (wrapping HuggingFace tokenizers library).

   Inherits from [`~tokenization_utils_base.PreTrainedTokenizerBase`].

   Handles all the shared methods for tokenization and special tokens, as well as methods for
   downloading/caching/loading pretrained tokenizers, as well as adding tokens to the vocabulary.

   This class also contains the added tokens in a unified way on top of all tokenizers so we don't have to handle the
   specific vocabulary augmentation methods of the various underlying dictionary structures (BPE, sentencepiece...).

   .. py:attribute:: name
      :value: 'base_tokenizer_wrapper_rel'

      
   .. py:method:: __init__(hf_tokenizers=None, max_seq_length = None, add_special_tokens = False)


   .. py:method:: get_size()


   .. py:method:: token_to_id(token)


   .. py:method:: get_pad_id()


   .. py:method:: __call__(text, truncation = True)

      Main method to tokenize and prepare for the model one or several sequence(s) or one or several pair(s) of
      sequences.

      :param text: The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings
                   (pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set
                   `is_split_into_words=True` (to lift the ambiguity with a batch of sequences).
      :type text: `str`, `List[str]`, `List[List[str]]`, *optional*
      :param text_pair: The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings
                        (pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set
                        `is_split_into_words=True` (to lift the ambiguity with a batch of sequences).
      :type text_pair: `str`, `List[str]`, `List[List[str]]`, *optional*
      :param text_target: The sequence or batch of sequences to be encoded as target texts. Each sequence can be a string or a
                          list of strings (pretokenized string). If the sequences are provided as list of strings (pretokenized),
                          you must set `is_split_into_words=True` (to lift the ambiguity with a batch of sequences).
      :type text_target: `str`, `List[str]`, `List[List[str]]`, *optional*
      :param text_pair_target: The sequence or batch of sequences to be encoded as target texts. Each sequence can be a string or a
                               list of strings (pretokenized string). If the sequences are provided as list of strings (pretokenized),
                               you must set `is_split_into_words=True` (to lift the ambiguity with a batch of sequences).
      :type text_pair_target: `str`, `List[str]`, `List[List[str]]`, *optional*


   .. py:method:: save(dir_path)


   .. py:method:: load(tokenizer_path, relcat_config, **kwargs)
      :classmethod: