medcat.tokenizers.meta_cat_tokenizers
Module Contents
Classes
Helper class that provides a standard way to create an ABC using |
|
Wrapper around a huggingface tokenizer so that it works with the |
|
Wrapper around a huggingface BERT tokenizer so that it works with the |
- class medcat.tokenizers.meta_cat_tokenizers.TokenizerWrapperBase(hf_tokenizer=None)
Bases:
abc.ABC
Helper class that provides a standard way to create an ABC using inheritance.
- Parameters:
hf_tokenizer (Optional[tokenizers.Tokenizer]) –
- name: str
- __init__(hf_tokenizer=None)
- Parameters:
hf_tokenizer (Optional[tokenizers.Tokenizer]) –
- Return type:
None
- __call__(text: str) Dict
- __call__(text: List[str]) List[Dict]
- abstract save(dir_path)
- Parameters:
dir_path (str) –
- Return type:
None
- abstract classmethod load(dir_path, model_variant='', **kwargs)
- Parameters:
dir_path (str) –
model_variant (Optional[str]) –
- Return type:
tokenizers.Tokenizer
- abstract get_size()
- Return type:
int
- abstract token_to_id(token)
- Parameters:
token (str) –
- Return type:
Union[int, List[int]]
- abstract get_pad_id()
- Return type:
Union[Optional[int], List[int]]
- ensure_tokenizer()
- Return type:
tokenizers.Tokenizer
- class medcat.tokenizers.meta_cat_tokenizers.TokenizerWrapperBPE(hf_tokenizers=None)
Bases:
TokenizerWrapperBase
Wrapper around a huggingface tokenizer so that it works with the MetaCAT models.
- Parameters:
tokenizers.ByteLevelBPETokenizer – A huggingface BBPE tokenizer.
hf_tokenizers (Optional[tokenizers.ByteLevelBPETokenizer]) –
- name = 'bbpe'
- __init__(hf_tokenizers=None)
- Parameters:
hf_tokenizers (Optional[tokenizers.ByteLevelBPETokenizer]) –
- Return type:
None
- __call__(text: str) Dict
- __call__(text: List[str]) List[Dict]
Tokenize some text
- Parameters:
text (Union[str, List[str]]) – Text/texts to be tokenized.
- Returns:
Union (dict, List[Dict]) – Dictionary/ies containing offset_mapping, input_ids and tokens corresponding to the input text/s.
- Raises:
Exception – If the input is something other than text or a list of text.
- save(dir_path)
- Parameters:
dir_path (str) –
- Return type:
None
- classmethod load(dir_path, model_variant='', **kwargs)
- Parameters:
dir_path (str) –
model_variant (Optional[str]) –
- Return type:
- get_size()
- Return type:
int
- token_to_id(token)
- Parameters:
token (str) –
- Return type:
Union[int, List[int]]
- get_pad_id()
- Return type:
Union[int, List[int]]
- class medcat.tokenizers.meta_cat_tokenizers.TokenizerWrapperBERT(hf_tokenizers=None)
Bases:
TokenizerWrapperBase
Wrapper around a huggingface BERT tokenizer so that it works with the MetaCAT models.
- Parameters:
transformers.models.bert.tokenization_bert_fast.BertTokenizerFast – A huggingface Fast BERT.
hf_tokenizers (Optional[transformers.models.bert.tokenization_bert_fast.BertTokenizerFast]) –
- name = 'bert-tokenizer'
- __init__(hf_tokenizers=None)
- Parameters:
hf_tokenizers (Optional[transformers.models.bert.tokenization_bert_fast.BertTokenizerFast]) –
- Return type:
None
- __call__(text: str) Dict
- __call__(text: List[str]) List[Dict]
- save(dir_path)
- Parameters:
dir_path (str) –
- Return type:
None
- classmethod load(dir_path, model_variant='', **kwargs)
- Parameters:
dir_path (str) –
model_variant (Optional[str]) –
- Return type:
- get_size()
- Return type:
int
- token_to_id(token)
- Parameters:
token (str) –
- Return type:
Union[int, List[int]]
- get_pad_id()
- Return type:
Optional[int]