`medcat.utils.meta_cat.ml_utils`

Module Contents

Functions

`set_all_seeds`(seed)
`create_batch_piped_data`(data, start_ind, end_ind, ...)	Creates a batch given data and start/end that denote batch size, will also add
`predict`(model, data, config)	Predict on data used in the meta_cat.pipe
`split_list_train_test`(data, test_size[, shuffle])	Shuffle and randomply split data
`print_report`(epoch, running_loss, all_logits, y[, name])	Prints some basic stats during training
`train_model`(model, data, config[, save_dir_path])	Trains a LSTM model (for now) with autocheckpoints
`eval_model`(model, data, config, tokenizer)	Evaluate a trained model on the provided data

Attributes

logger

medcat.utils.meta_cat.ml_utils.logger

medcat.utils.meta_cat.ml_utils.set_all_seeds(seed)

Parameters:: seed (int) –
Return type:: None

medcat.utils.meta_cat.ml_utils.create_batch_piped_data(data, start_ind, end_ind, device, pad_id)

Creates a batch given data and start/end that denote batch size, will also add padding and move to the right device.

Parameters:

data (List[Tuple[List[int], int, Optional[int]]]) – Data in the format: [[<[input_ids]>, <cpos>, Optional[int]], …], the third column is optional and represents the output label
start_ind (int) – Start index of this batch
end_ind (int) – End index of this batch
device (torch.device) – Where to move the data
pad_id (int) – Padding index

Returns:

x () – Same as data, but subsetted and as a tensor
cpos () – Center positions for the data

Return type:

Tuple

medcat.utils.meta_cat.ml_utils.predict(model, data, config)

Predict on data used in the meta_cat.pipe

Parameters:

model (nn.Module) – The model.
data (List[Tuple[List[int], int, Optional[int]]]) – Data in the format: [[<input_ids>, <cpos>], …]
config (ConfigMetaCAT) – Configuration for this meta_cat instance.

Returns:

predictions (List[int]) – For each row of input data a prediction
confidence (List[float]) – For each prediction a confidence value

Return type:

Tuple

medcat.utils.meta_cat.ml_utils.split_list_train_test(data, test_size, shuffle=True)

Shuffle and randomply split data

Parameters:

data (List) – The data.
test_size (float) – The test size.
shuffle (bool) – Whether to shuffle the data. Defaults to True.

Returns:

Tuple – The train data, and the test data.

Return type:

Tuple

medcat.utils.meta_cat.ml_utils.print_report(epoch, running_loss, all_logits, y, name='Train')

Prints some basic stats during training

Parameters:

epoch (int) – Number of epochs.
running_loss (List) – The loss
all_logits (List) – List of logits
y (Any) – The y array.
name (str) – The name of the report. Defaults to Train.

Return type:

None

medcat.utils.meta_cat.ml_utils.train_model(model, data, config, save_dir_path=None)

Trains a LSTM model (for now) with autocheckpoints

Parameters:

model (nn.Module) – The model
data (List) – The data.
config (ConfigMetaCAT) – MetaCAT config.
save_dir_path (Optional[str]) – The save dir path if required. Defaults to None.

Returns:

Dict – The classification report for the winner.

Raises:

Exception – If auto-save is enabled but no save dir path is provided.

Return type:

Dict

medcat.utils.meta_cat.ml_utils.eval_model(model, data, config, tokenizer)

Evaluate a trained model on the provided data

Parameters:

model (nn.Module) – The model.
data (List) – The data.
config (ConfigMetaCAT) – The MetaCAT config.
tokenizer (TokenizerWrapperBase) – The tokenizer.

Returns:

Dict – Results (precision, recall, f1, examples, confusion matrix)

Return type:

Dict

medcat.utils.meta_cat.ml_utils

Module Contents

Functions

Attributes

`medcat.utils.meta_cat.ml_utils`