medcat.utils.relation_extraction.bert.tokenizer

Module Contents

Classes

TokenizerWrapperBERT_RelationExtraction

Base class for all fast tokenizers (wrapping HuggingFace tokenizers library).

Attributes

logger

medcat.utils.relation_extraction.bert.tokenizer.logger
class medcat.utils.relation_extraction.bert.tokenizer.TokenizerWrapperBERT_RelationExtraction(hf_tokenizers=None, max_seq_length=None, add_special_tokens=False)

Bases: medcat.utils.relation_extraction.tokenizer.BaseTokenizerWrapper_RelationExtraction

Base class for all fast tokenizers (wrapping HuggingFace tokenizers library).

Inherits from [~tokenization_utils_base.PreTrainedTokenizerBase].

Handles all the shared methods for tokenization and special tokens, as well as methods for downloading/caching/loading pretrained tokenizers, as well as adding tokens to the vocabulary.

This class also contains the added tokens in a unified way on top of all tokenizers so we don’t have to handle the specific vocabulary augmentation methods of the various underlying dictionary structures (BPE, sentencepiece…).

Parameters:
  • max_seq_length (Optional[int]) –

  • add_special_tokens (Optional[bool]) –

name = 'tokenizer_wrapper_bert_rel'

Wrapper around a huggingface BERT tokenizer so that it works with the RelCAT models.

Parameters:

hf_tokenizers (transformers.models.bert.tokenization_bert_fast.PreTrainedTokenizerFast) – A huggingface Fast BERT.

name = 'bert-tokenizer'
pretrained_model_name_or_path = 'bert-base-uncased'
classmethod load(tokenizer_path, relcat_config, **kwargs)
Parameters:
Return type:

TokenizerWrapperBERT_RelationExtraction