:py:mod:`medcat.utils.relation_extraction.bert.tokenizer` ========================================================= .. py:module:: medcat.utils.relation_extraction.bert.tokenizer Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.relation_extraction.bert.tokenizer.TokenizerWrapperBERT_RelationExtraction Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.relation_extraction.bert.tokenizer.logger .. py:data:: logger .. py:class:: TokenizerWrapperBERT_RelationExtraction(hf_tokenizers=None, max_seq_length = None, add_special_tokens = False) Bases: :py:obj:`medcat.utils.relation_extraction.tokenizer.BaseTokenizerWrapper_RelationExtraction` Base class for all fast tokenizers (wrapping HuggingFace tokenizers library). Inherits from [`~tokenization_utils_base.PreTrainedTokenizerBase`]. Handles all the shared methods for tokenization and special tokens, as well as methods for downloading/caching/loading pretrained tokenizers, as well as adding tokens to the vocabulary. This class also contains the added tokens in a unified way on top of all tokenizers so we don't have to handle the specific vocabulary augmentation methods of the various underlying dictionary structures (BPE, sentencepiece...). .. py:attribute:: name :value: 'tokenizer_wrapper_bert_rel' Wrapper around a huggingface BERT tokenizer so that it works with the RelCAT models. :param hf_tokenizers: A huggingface Fast BERT. :type hf_tokenizers: `transformers.models.bert.tokenization_bert_fast.PreTrainedTokenizerFast` .. py:attribute:: name :value: 'bert-tokenizer' .. py:attribute:: pretrained_model_name_or_path :value: 'bert-base-uncased' .. py:method:: load(tokenizer_path, relcat_config, **kwargs) :classmethod: