:py:mod:`medcat.datasets.transformers_ner`
==========================================
.. py:module:: medcat.datasets.transformers_ner
Module Contents
---------------
Classes
~~~~~~~
.. autoapisummary::
medcat.datasets.transformers_ner.MedCATAnnotationsConfig
medcat.datasets.transformers_ner.TransformersDatasetNER
Attributes
~~~~~~~~~~
.. autoapisummary::
medcat.datasets.transformers_ner._CITATION
medcat.datasets.transformers_ner._DESCRIPTION
.. py:data:: _CITATION
:value: Multiline-String
.. raw:: html
Show Value
.. code-block:: python
"""@misc{kraljevic2020multidomain,
title={Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit},
author={Zeljko Kraljevic and Thomas Searle and Anthony Shek and Lukasz Roguski and Kawsar Noor and Daniel Bean and Aurelie Mascio and Leilei Zhu and Amos A Folarin and Angus Roberts and Rebecca Bendayan and Mark P Richardson and Robert Stewart and Anoop D Shah and Wai Keong Wong and Zina Ibrahim and James T Teo and Richard JB Dobson},
year={2020},
eprint={2010.01165},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
"""
.. raw:: html
.. py:data:: _DESCRIPTION
:value: 'Takes as input a json export from medcattrainer.'
.. py:class:: MedCATAnnotationsConfig
Bases: :py:obj:`datasets.BuilderConfig`
BuilderConfig for MedCATNER.
:param \*\*kwargs: keyword arguments forwarded to super.
.. py:class:: TransformersDatasetNER(cache_dir = None, dataset_name = None, config_name = None, hash = None, base_path = None, info = None, features = None, token = None, use_auth_token='deprecated', repo_id = None, data_files = None, data_dir = None, storage_options = None, writer_batch_size = None, name='deprecated', **config_kwargs)
Bases: :py:obj:`datasets.GeneratorBasedBuilder`
MedCATNER: Output of MedCATtrainer
.. py:attribute:: BUILDER_CONFIGS
.. py:method:: _info()
Construct the DatasetInfo object. See `DatasetInfo` for details.
Warning: This function is only called once and the result is cached for all
following .info() calls.
:Returns: **info** -- (DatasetInfo) The dataset information
.. py:method:: _split_generators(dl_manager)
Returns SplitGenerators.
.. py:method:: _generate_examples(filepaths)
Default function generating examples for each `SplitGenerator`.
This function preprocess the examples from the raw data to the preprocessed
dataset files.
This function is called once for each `SplitGenerator` defined in
`_split_generators`. The examples yielded here will be written on
disk.
:param \*\*kwargs: Arguments forwarded from the SplitGenerator.gen_kwargs
:type \*\*kwargs: additional keyword arguments
:Yields: *key* --
`str` or `int`, a unique deterministic example identification key.
* Unique: An error will be raised if two examples are yield with the
same key.
* Deterministic: When generating the dataset twice, the same example
should have the same key.
Good keys can be the image id, or line number if examples are extracted
from a text file.
The key will be hashed and sorted to shuffle examples deterministically,
such as generating the dataset multiple times keep examples in the
same order.
example: `dict`, a feature dictionary
ready to be encoded and written to disk. The example will be
encoded with `self.info.features.encode_example({...})`.