medcat.datasets.patient_concept_stream

Module Contents

Classes

PatientConceptStreamConfig

BuilderConfig for PatientConceptStream.

PatientConceptStream

PatientConceptStream: as input takes the patient to stream of concepts.

Attributes

_CITATION

_DESCRIPTION

medcat.datasets.patient_concept_stream._CITATION = Multiline-String
Show Value
"""@misc{kraljevic2020multidomain,
      title={Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit},
      author={Zeljko Kraljevic and Thomas Searle and Anthony Shek and Lukasz Roguski and Kawsar Noor and Daniel Bean and Aurelie Mascio and Leilei Zhu and Amos A Folarin and Angus Roberts and Rebecca Bendayan and Mark P Richardson and Robert Stewart and Anoop D Shah and Wai Keong Wong and Zina Ibrahim and James T Teo and Richard JB Dobson},
      year={2020},
      eprint={2010.01165},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
"""
medcat.datasets.patient_concept_stream._DESCRIPTION = Multiline-String
Show Value
"""Takes as input a pickled dict of pt2stream. The format should be:
    {'patient_id': (concept_cui, concept_count_for_patient, timestamp_of_first_occurrence_for_patient), ...}
"""
class medcat.datasets.patient_concept_stream.PatientConceptStreamConfig

Bases: datasets.BuilderConfig

BuilderConfig for PatientConceptStream.

Parameters:

**kwargs – keyword arguments forwarded to super.

class medcat.datasets.patient_concept_stream.PatientConceptStream(cache_dir=None, dataset_name=None, config_name=None, hash=None, base_path=None, info=None, features=None, token=None, use_auth_token='deprecated', repo_id=None, data_files=None, data_dir=None, storage_options=None, writer_batch_size=None, name='deprecated', **config_kwargs)

Bases: datasets.GeneratorBasedBuilder

PatientConceptStream: as input takes the patient to stream of concepts.

TODO: Move the preparations scripts out of notebooks

Parameters:
  • cache_dir (Optional[str]) –

  • dataset_name (Optional[str]) –

  • config_name (Optional[str]) –

  • hash (Optional[str]) –

  • base_path (Optional[str]) –

  • info (Optional[datasets.info.DatasetInfo]) –

  • features (Optional[datasets.features.Features]) –

  • token (Optional[Union[bool, str]]) –

  • repo_id (Optional[str]) –

  • data_files (Optional[Union[str, list, dict, datasets.data_files.DataFilesDict]]) –

  • data_dir (Optional[str]) –

  • storage_options (Optional[dict]) –

  • writer_batch_size (Optional[int]) –

BUILDER_CONFIGS
_info()

Construct the DatasetInfo object. See DatasetInfo for details.

Warning: This function is only called once and the result is cached for all following .info() calls.

Returns:

info – (DatasetInfo) The dataset information

_split_generators(dl_manager)

Returns SplitGenerators.

_generate_examples(filepath)

This function returns the examples in the raw (text) form.