:py:mod:`medcat.datasets.transformers_ner` ========================================== .. py:module:: medcat.datasets.transformers_ner Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.datasets.transformers_ner.MedCATAnnotationsConfig medcat.datasets.transformers_ner.TransformersDatasetNER Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.datasets.transformers_ner._CITATION medcat.datasets.transformers_ner._DESCRIPTION .. py:data:: _CITATION :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """@misc{kraljevic2020multidomain, title={Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit}, author={Zeljko Kraljevic and Thomas Searle and Anthony Shek and Lukasz Roguski and Kawsar Noor and Daniel Bean and Aurelie Mascio and Leilei Zhu and Amos A Folarin and Angus Roberts and Rebecca Bendayan and Mark P Richardson and Robert Stewart and Anoop D Shah and Wai Keong Wong and Zina Ibrahim and James T Teo and Richard JB Dobson}, year={2020}, eprint={2010.01165}, archivePrefix={arXiv}, primaryClass={cs.CL} } """ .. raw:: html
.. py:data:: _DESCRIPTION :value: 'Takes as input a json export from medcattrainer.' .. py:class:: MedCATAnnotationsConfig Bases: :py:obj:`datasets.BuilderConfig` BuilderConfig for MedCATNER. :param \*\*kwargs: keyword arguments forwarded to super. .. py:class:: TransformersDatasetNER(cache_dir = None, dataset_name = None, config_name = None, hash = None, base_path = None, info = None, features = None, token = None, use_auth_token='deprecated', repo_id = None, data_files = None, data_dir = None, storage_options = None, writer_batch_size = None, name='deprecated', **config_kwargs) Bases: :py:obj:`datasets.GeneratorBasedBuilder` MedCATNER: Output of MedCATtrainer .. py:attribute:: BUILDER_CONFIGS .. py:method:: _info() Construct the DatasetInfo object. See `DatasetInfo` for details. Warning: This function is only called once and the result is cached for all following .info() calls. :Returns: **info** -- (DatasetInfo) The dataset information .. py:method:: _split_generators(dl_manager) Returns SplitGenerators. .. py:method:: _generate_examples(filepaths) Default function generating examples for each `SplitGenerator`. This function preprocess the examples from the raw data to the preprocessed dataset files. This function is called once for each `SplitGenerator` defined in `_split_generators`. The examples yielded here will be written on disk. :param \*\*kwargs: Arguments forwarded from the SplitGenerator.gen_kwargs :type \*\*kwargs: additional keyword arguments :Yields: *key* -- `str` or `int`, a unique deterministic example identification key. * Unique: An error will be raised if two examples are yield with the same key. * Deterministic: When generating the dataset twice, the same example should have the same key. Good keys can be the image id, or line number if examples are extracted from a text file. The key will be hashed and sorted to shuffle examples deterministically, such as generating the dataset multiple times keep examples in the same order. example: `dict`, a feature dictionary ready to be encoded and written to disk. The example will be encoded with `self.info.features.encode_example({...})`.