:py:mod:`medcat.tokenizers.transformers_ner`
============================================

.. py:module:: medcat.tokenizers.transformers_ner


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medcat.tokenizers.transformers_ner.TransformersTokenizerNER


Attributes
~~~~~~~~~~

.. autoapisummary::

   medcat.tokenizers.transformers_ner.logger


.. py:data:: logger

   
.. py:class:: TransformersTokenizerNER(hf_tokenizer = None, max_len = 512, id2type = None, cui2name = None)


   Bases: :py:obj:`object`

   Args:
   hf_tokenizer
       Must be able to return token offsets.
   max_len:
       Max sequence length, if longer it will be split into multiple examples.
   id2type:
       Can be ignored in most cases, should be a map from token to 'start' or 'sub' meaning is the token
           a subword or the start/full word. For BERT 'start' is everything that does not begin with ##.
   cui2name:
       Map from CUI to full name for labels.

   .. py:method:: __init__(hf_tokenizer = None, max_len = 512, id2type = None, cui2name = None)


   .. py:method:: calculate_label_map(dataset)


   .. py:method:: encode(examples, ignore_subwords = False)

      Used with huggingface datasets map function to convert medcat_ner dataset into the
      appropriate form for NER with BERT. It will split long text segments into max_len sequences (performs chunking).

      :param examples: Stream of examples.
      :type examples: Dict
      :param ignore_subwords: If set to `True` subwords of any token will get the special label `X`.
      :type ignore_subwords: bool

      :Returns: **Dict** -- The same dict, modified.


   .. py:method:: save(path)


   .. py:method:: ensure_tokenizer()


   .. py:method:: load(path)
      :classmethod: