:py:mod:`medcat.tokenizers.meta_cat_tokenizers` =============================================== .. py:module:: medcat.tokenizers.meta_cat_tokenizers Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.tokenizers.meta_cat_tokenizers.TokenizerWrapperBase medcat.tokenizers.meta_cat_tokenizers.TokenizerWrapperBPE medcat.tokenizers.meta_cat_tokenizers.TokenizerWrapperBERT .. py:class:: TokenizerWrapperBase(hf_tokenizer = None) Bases: :py:obj:`abc.ABC` Helper class that provides a standard way to create an ABC using inheritance. .. py:attribute:: name :type: str .. py:method:: __init__(hf_tokenizer = None) .. py:method:: __call__(text: str) -> Dict __call__(text: List[str]) -> List[Dict] .. py:method:: save(dir_path) :abstractmethod: .. py:method:: load(dir_path, model_variant = '', **kwargs) :classmethod: :abstractmethod: .. py:method:: get_size() :abstractmethod: .. py:method:: token_to_id(token) :abstractmethod: .. py:method:: get_pad_id() :abstractmethod: .. py:method:: ensure_tokenizer() .. py:class:: TokenizerWrapperBPE(hf_tokenizers = None) Bases: :py:obj:`TokenizerWrapperBase` Wrapper around a huggingface tokenizer so that it works with the MetaCAT models. :param tokenizers.ByteLevelBPETokenizer: A huggingface BBPE tokenizer. .. py:attribute:: name :value: 'bbpe' .. py:method:: __init__(hf_tokenizers = None) .. py:method:: __call__(text: str) -> Dict __call__(text: List[str]) -> List[Dict] Tokenize some text :param text: Text/texts to be tokenized. :type text: Union[str, List[str]] :Returns: **Union** (*dict, List[Dict]*) -- Dictionary/ies containing `offset_mapping`, `input_ids` and `tokens` corresponding to the input text/s. :raises Exception: If the input is something other than text or a list of text. .. py:method:: save(dir_path) .. py:method:: load(dir_path, model_variant = '', **kwargs) :classmethod: .. py:method:: get_size() .. py:method:: token_to_id(token) .. py:method:: get_pad_id() .. py:class:: TokenizerWrapperBERT(hf_tokenizers = None) Bases: :py:obj:`TokenizerWrapperBase` Wrapper around a huggingface BERT tokenizer so that it works with the MetaCAT models. :param transformers.models.bert.tokenization_bert_fast.BertTokenizerFast: A huggingface Fast BERT. .. py:attribute:: name :value: 'bert-tokenizer' .. py:method:: __init__(hf_tokenizers = None) .. py:method:: __call__(text: str) -> Dict __call__(text: List[str]) -> List[Dict] .. py:method:: save(dir_path) .. py:method:: load(dir_path, model_variant = '', **kwargs) :classmethod: .. py:method:: get_size() .. py:method:: token_to_id(token) .. py:method:: get_pad_id()