`medcat.rel_cat`

Module Contents

Classes

`BalancedBatchSampler`	Base class for all Samplers.
`RelCAT`	The RelCAT class used for training 'Relation-Annotation' models, i.e., annotation of relations

class medcat.rel_cat.BalancedBatchSampler(dataset, classes, batch_size, max_samples, max_minority)

Bases: torch.utils.data.Sampler

Base class for all Samplers.

Every Sampler subclass has to provide an __iter__() method, providing a way to iterate over indices or lists of indices (batches) of dataset elements, and may provide a __len__() method that returns the length of the returned iterators.

Parameters:: data_source (Dataset) – This argument is not used and will be removed in 2.2.0. You may still have custom implementation that utilizes it.

Example

>>> # xdoctest: +SKIP
>>> class AccedingSequenceLengthSampler(Sampler[int]):
>>>     def __init__(self, data: List[str]) -> None:
>>>         self.data = data
>>>
>>>     def __len__(self) -> int:
>>>         return len(self.data)
>>>
>>>     def __iter__(self) -> Iterator[int]:
>>>         sizes = torch.tensor([len(x) for x in self.data])
>>>         yield from torch.argsort(sizes).tolist()
>>>
>>> class AccedingSequenceLengthBatchSampler(Sampler[List[int]]):
>>>     def __init__(self, data: List[str], batch_size: int) -> None:
>>>         self.data = data
>>>         self.batch_size = batch_size
>>>
>>>     def __len__(self) -> int:
>>>         return (len(self.data) + self.batch_size - 1) // self.batch_size
>>>
>>>     def __iter__(self) -> Iterator[List[int]]:
>>>         sizes = torch.tensor([len(x) for x in self.data])
>>>         for batch in torch.chunk(torch.argsort(sizes), len(self)):
>>>             yield batch.tolist()

Note

The __len__() method isn’t strictly required by DataLoader, but is expected in any calculation involving the length of a DataLoader.

__init__(dataset, classes, batch_size, max_samples, max_minority)

__len__()

__iter__()

class medcat.rel_cat.RelCAT(cdb, config=ConfigRelCAT(), task='train', init_model=False)

Bases: medcat.pipeline.pipe_runner.PipeRunner

The RelCAT class used for training ‘Relation-Annotation’ models, i.e., annotation of relations: between clinical concepts.

Parameters:

cdb (CDB) – cdb, this is used when creating relation datasets.
tokenizer (TokenizerWrapperBERT) – The Huggingface tokenizer instance. This can be a pre-trained tokenzier instance from a BERT-style model. For now, only BERT models are supported.
config (ConfigRelCAT) – the configuration for RelCAT. Param descriptions available in ConfigRelCAT class docs.
task (str, optional) – What task is this model supposed to handle. Defaults to “train”
init_model (bool, optional) – loads default model. Defaults to False.

name = 'rel_cat'

log

__init__(cdb, config=ConfigRelCAT(), task='train', init_model=False)

Parameters:

cdb (medcat.cdb.CDB) –
config (medcat.config_rel_cat.ConfigRelCAT) –

save(save_path='./')

Parameters:: save_path (str) –
Return type:: None

classmethod load(load_path='./')

Parameters:: load_path (str) –
Return type:: RelCAT

__call__(doc)

Parameters:: doc (spacy.tokens.Doc) –
Return type:: spacy.tokens.Doc

_create_test_train_datasets(data, split_sets=False)

Parameters:

data (Dict) –
split_sets (bool) –

train(export_data_path='', train_csv_path='', test_csv_path='', checkpoint_path='./')

Parameters:

export_data_path (str) –
train_csv_path (str) –
test_csv_path (str) –
checkpoint_path (str) –

evaluate_(output_logits, labels, ignore_idx)

evaluate_results(data_loader, pad_id)

pipe(stream, *args, **kwargs)

Parameters:: stream (Iterable[spacy.tokens.Doc]) –
Return type:: Iterator[spacy.tokens.Doc]

predict_text_with_anns(text, annotations)

Creates spacy doc from text and annotation input. Predicts using self.__call__

Parameters:

text (str) – text
annotations (Dict) –
dict containing the entities from NER (of your choosing), the format must be the following format:

[

{
“cui”: “202099003”, -this is optional “value”: “discoid lateral meniscus”, “start”: 294, “end”: 318

}, {

”cui”: “202099003”, “value”: “Discoid lateral meniscus”, “start”: 1905, “end”: 1929,

}

]

Returns:

Doc – spacy doc with the relations.

Return type:

spacy.tokens.Doc

medcat.rel_cat

Module Contents

Classes

`medcat.rel_cat`