:py:mod:`medcat.rel_cat` ======================== .. py:module:: medcat.rel_cat Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.rel_cat.BalancedBatchSampler medcat.rel_cat.RelCAT .. py:class:: BalancedBatchSampler(dataset, classes, batch_size, max_samples, max_minority) Bases: :py:obj:`torch.utils.data.Sampler` Base class for all Samplers. Every Sampler subclass has to provide an :meth:`__iter__` method, providing a way to iterate over indices or lists of indices (batches) of dataset elements, and may provide a :meth:`__len__` method that returns the length of the returned iterators. :param data_source: This argument is not used and will be removed in 2.2.0. You may still have custom implementation that utilizes it. :type data_source: Dataset .. rubric:: Example >>> # xdoctest: +SKIP >>> class AccedingSequenceLengthSampler(Sampler[int]): >>> def __init__(self, data: List[str]) -> None: >>> self.data = data >>> >>> def __len__(self) -> int: >>> return len(self.data) >>> >>> def __iter__(self) -> Iterator[int]: >>> sizes = torch.tensor([len(x) for x in self.data]) >>> yield from torch.argsort(sizes).tolist() >>> >>> class AccedingSequenceLengthBatchSampler(Sampler[List[int]]): >>> def __init__(self, data: List[str], batch_size: int) -> None: >>> self.data = data >>> self.batch_size = batch_size >>> >>> def __len__(self) -> int: >>> return (len(self.data) + self.batch_size - 1) // self.batch_size >>> >>> def __iter__(self) -> Iterator[List[int]]: >>> sizes = torch.tensor([len(x) for x in self.data]) >>> for batch in torch.chunk(torch.argsort(sizes), len(self)): >>> yield batch.tolist() .. note:: The :meth:`__len__` method isn't strictly required by :class:`~torch.utils.data.DataLoader`, but is expected in any calculation involving the length of a :class:`~torch.utils.data.DataLoader`. .. py:method:: __init__(dataset, classes, batch_size, max_samples, max_minority) .. py:method:: __len__() .. py:method:: __iter__() .. py:class:: RelCAT(cdb, config = ConfigRelCAT(), task='train', init_model=False) Bases: :py:obj:`medcat.pipeline.pipe_runner.PipeRunner` The RelCAT class used for training 'Relation-Annotation' models, i.e., annotation of relations between clinical concepts. :param cdb: cdb, this is used when creating relation datasets. :type cdb: CDB :param tokenizer: The Huggingface tokenizer instance. This can be a pre-trained tokenzier instance from a BERT-style model. For now, only BERT models are supported. :type tokenizer: TokenizerWrapperBERT :param config: the configuration for RelCAT. Param descriptions available in ConfigRelCAT class docs. :type config: ConfigRelCAT :param task: What task is this model supposed to handle. Defaults to "train" :type task: str, optional :param init_model: loads default model. Defaults to False. :type init_model: bool, optional .. py:attribute:: name :value: 'rel_cat' .. py:attribute:: log .. py:method:: __init__(cdb, config = ConfigRelCAT(), task='train', init_model=False) .. py:method:: save(save_path = './') .. py:method:: load(load_path = './') :classmethod: .. py:method:: __call__(doc) .. py:method:: _create_test_train_datasets(data, split_sets = False) .. py:method:: train(export_data_path = '', train_csv_path = '', test_csv_path = '', checkpoint_path = './') .. py:method:: evaluate_(output_logits, labels, ignore_idx) .. py:method:: evaluate_results(data_loader, pad_id) .. py:method:: pipe(stream, *args, **kwargs) .. py:method:: predict_text_with_anns(text, annotations) Creates spacy doc from text and annotation input. Predicts using self.__call__ :param text: text :type text: str :param annotations: dict containing the entities from NER (of your choosing), the format must be the following format: [ { "cui": "202099003", -this is optional "value": "discoid lateral meniscus", "start": 294, "end": 318 }, { "cui": "202099003", "value": "Discoid lateral meniscus", "start": 1905, "end": 1929, } ] :type annotations: Dict :Returns: **Doc** -- spacy doc with the relations.