medcat.rel_cat

Module Contents

Classes

BalancedBatchSampler

Base class for all Samplers.

RelCAT

The RelCAT class used for training 'Relation-Annotation' models, i.e., annotation of relations

class medcat.rel_cat.BalancedBatchSampler(dataset, classes, batch_size, max_samples, max_minority)

Bases: torch.utils.data.Sampler

Base class for all Samplers.

Every Sampler subclass has to provide an __iter__() method, providing a way to iterate over indices or lists of indices (batches) of dataset elements, and may provide a __len__() method that returns the length of the returned iterators.

Parameters:

data_source (Dataset) – This argument is not used and will be removed in 2.2.0. You may still have custom implementation that utilizes it.

Example

>>> # xdoctest: +SKIP
>>> class AccedingSequenceLengthSampler(Sampler[int]):
>>>     def __init__(self, data: List[str]) -> None:
>>>         self.data = data
>>>
>>>     def __len__(self) -> int:
>>>         return len(self.data)
>>>
>>>     def __iter__(self) -> Iterator[int]:
>>>         sizes = torch.tensor([len(x) for x in self.data])
>>>         yield from torch.argsort(sizes).tolist()
>>>
>>> class AccedingSequenceLengthBatchSampler(Sampler[List[int]]):
>>>     def __init__(self, data: List[str], batch_size: int) -> None:
>>>         self.data = data
>>>         self.batch_size = batch_size
>>>
>>>     def __len__(self) -> int:
>>>         return (len(self.data) + self.batch_size - 1) // self.batch_size
>>>
>>>     def __iter__(self) -> Iterator[List[int]]:
>>>         sizes = torch.tensor([len(x) for x in self.data])
>>>         for batch in torch.chunk(torch.argsort(sizes), len(self)):
>>>             yield batch.tolist()

Note

The __len__() method isn’t strictly required by DataLoader, but is expected in any calculation involving the length of a DataLoader.

__init__(dataset, classes, batch_size, max_samples, max_minority)
__len__()
__iter__()
class medcat.rel_cat.RelCAT(cdb, config=ConfigRelCAT(), task='train', init_model=False)

Bases: medcat.pipeline.pipe_runner.PipeRunner

The RelCAT class used for training ‘Relation-Annotation’ models, i.e., annotation of relations

between clinical concepts.

Parameters:
  • cdb (CDB) – cdb, this is used when creating relation datasets.

  • tokenizer (TokenizerWrapperBERT) – The Huggingface tokenizer instance. This can be a pre-trained tokenzier instance from a BERT-style model. For now, only BERT models are supported.

  • config (ConfigRelCAT) – the configuration for RelCAT. Param descriptions available in ConfigRelCAT class docs.

  • task (str, optional) – What task is this model supposed to handle. Defaults to “train”

  • init_model (bool, optional) – loads default model. Defaults to False.

name = 'rel_cat'
log
__init__(cdb, config=ConfigRelCAT(), task='train', init_model=False)
Parameters:
save(save_path='./')
Parameters:

save_path (str) –

Return type:

None

classmethod load(load_path='./')
Parameters:

load_path (str) –

Return type:

RelCAT

__call__(doc)
Parameters:

doc (spacy.tokens.Doc) –

Return type:

spacy.tokens.Doc

_create_test_train_datasets(data, split_sets=False)
Parameters:
  • data (Dict) –

  • split_sets (bool) –

train(export_data_path='', train_csv_path='', test_csv_path='', checkpoint_path='./')
Parameters:
  • export_data_path (str) –

  • train_csv_path (str) –

  • test_csv_path (str) –

  • checkpoint_path (str) –

evaluate_(output_logits, labels, ignore_idx)
evaluate_results(data_loader, pad_id)
pipe(stream, *args, **kwargs)
Parameters:

stream (Iterable[spacy.tokens.Doc]) –

Return type:

Iterator[spacy.tokens.Doc]

predict_text_with_anns(text, annotations)

Creates spacy doc from text and annotation input. Predicts using self.__call__

Parameters:
  • text (str) – text

  • annotations (Dict) –

    dict containing the entities from NER (of your choosing), the format must be the following format:

    [
    {

    “cui”: “202099003”, -this is optional “value”: “discoid lateral meniscus”, “start”: 294, “end”: 318

    }, {

    ”cui”: “202099003”, “value”: “Discoid lateral meniscus”, “start”: 1905, “end”: 1929,

    }

    ]

Returns:

Doc – spacy doc with the relations.

Return type:

spacy.tokens.Doc