:py:mod:`medcat.datasets.transformers_ner`
==========================================

.. py:module:: medcat.datasets.transformers_ner


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medcat.datasets.transformers_ner.MedCATAnnotationsConfig
   medcat.datasets.transformers_ner.TransformersDatasetNER


Attributes
~~~~~~~~~~

.. autoapisummary::

   medcat.datasets.transformers_ner._CITATION
   medcat.datasets.transformers_ner._DESCRIPTION


.. py:data:: _CITATION
   :value: Multiline-String

    .. raw:: html

        <details><summary>Show Value</summary>

    .. code-block:: python

        """@misc{kraljevic2020multidomain,
              title={Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit},
              author={Zeljko Kraljevic and Thomas Searle and Anthony Shek and Lukasz Roguski and Kawsar Noor and Daniel Bean and Aurelie Mascio and Leilei Zhu and Amos A Folarin and Angus Roberts and Rebecca Bendayan and Mark P Richardson and Robert Stewart and Anoop D Shah and Wai Keong Wong and Zina Ibrahim and James T Teo and Richard JB Dobson},
              year={2020},
              eprint={2010.01165},
              archivePrefix={arXiv},
              primaryClass={cs.CL}
        }
        """

    .. raw:: html

        </details>

   
.. py:data:: _DESCRIPTION
   :value: 'Takes as input a json export from medcattrainer.'

   
.. py:class:: MedCATAnnotationsConfig


   Bases: :py:obj:`datasets.BuilderConfig`

   BuilderConfig for MedCATNER.

   :param \*\*kwargs: keyword arguments forwarded to super.


.. py:class:: TransformersDatasetNER(cache_dir = None, dataset_name = None, config_name = None, hash = None, base_path = None, info = None, features = None, token = None, use_auth_token='deprecated', repo_id = None, data_files = None, data_dir = None, storage_options = None, writer_batch_size = None, name='deprecated', **config_kwargs)


   Bases: :py:obj:`datasets.GeneratorBasedBuilder`

   MedCATNER: Output of MedCATtrainer

   .. py:attribute:: BUILDER_CONFIGS

      
   .. py:method:: _info()

      Construct the DatasetInfo object. See `DatasetInfo` for details.

      Warning: This function is only called once and the result is cached for all
      following .info() calls.

      :Returns: **info** -- (DatasetInfo) The dataset information


   .. py:method:: _split_generators(dl_manager)

      Returns SplitGenerators.


   .. py:method:: _generate_examples(filepaths)

      Default function generating examples for each `SplitGenerator`.

      This function preprocess the examples from the raw data to the preprocessed
      dataset files.
      This function is called once for each `SplitGenerator` defined in
      `_split_generators`. The examples yielded here will be written on
      disk.

      :param \*\*kwargs: Arguments forwarded from the SplitGenerator.gen_kwargs
      :type \*\*kwargs: additional keyword arguments

      :Yields: *key* --

               `str` or `int`, a unique deterministic example identification key.
                   * Unique: An error will be raised if two examples are yield with the
                       same key.
                   * Deterministic: When generating the dataset twice, the same example
                       should have the same key.
                   Good keys can be the image id, or line number if examples are extracted
                   from a text file.
                   The key will be hashed and sorted to shuffle examples deterministically,
                   such as generating the dataset multiple times keep examples in the
                   same order.
               example: `dict<str feature_name, feature_value>`, a feature dictionary
                   ready to be encoded and written to disk. The example will be
                   encoded with `self.info.features.encode_example({...})`.