medcat.utils.meta_cat.ml_utils

Module Contents

Classes

FocalLoss

Base class for all neural network modules.

Functions

set_all_seeds(seed)

create_batch_piped_data(data, start_ind, end_ind, ...)

Creates a batch given data and start/end that denote batch size, will also add

predict(model, data, config)

Predict on data used in the meta_cat.pipe

split_list_train_test(data, test_size[, shuffle])

Shuffle and randomly split data

print_report(epoch, running_loss, all_logits, y[, name])

Prints some basic stats during training

train_model(model, data, config[, save_dir_path])

Trains a LSTM model and BERT with autocheckpoints

eval_model(model, data, config, tokenizer)

Evaluate a trained model on the provided data

Attributes

logger

medcat.utils.meta_cat.ml_utils.logger
medcat.utils.meta_cat.ml_utils.set_all_seeds(seed)
Parameters:

seed (int) –

Return type:

None

medcat.utils.meta_cat.ml_utils.create_batch_piped_data(data, start_ind, end_ind, device, pad_id)

Creates a batch given data and start/end that denote batch size, will also add padding and move to the right device.

Parameters:
  • data (List[Tuple[List[int], int, Optional[int]]]) – Data in the format: [[<[input_ids]>, <cpos>, Optional[int]], …], the third column is optional and represents the output label

  • start_ind (int) – Start index of this batch

  • end_ind (int) – End index of this batch

  • device (torch.device) – Where to move the data

  • pad_id (int) – Padding index

Returns:
  • x () – Same as data, but subsetted and as a tensor

  • cpos () – Center positions for the data

  • attention_mask – Indicating padding mask for the data

  • y – class label of the data

Return type:

Tuple

medcat.utils.meta_cat.ml_utils.predict(model, data, config)

Predict on data used in the meta_cat.pipe

Parameters:
  • model (nn.Module) – The model.

  • data (List[Tuple[List[int], int, Optional[int]]]) – Data in the format: [[<input_ids>, <cpos>], …]

  • config (ConfigMetaCAT) – Configuration for this meta_cat instance.

Returns:
  • predictions (List[int]) – For each row of input data a prediction

  • confidence (List[float]) – For each prediction a confidence value

Return type:

Tuple

medcat.utils.meta_cat.ml_utils.split_list_train_test(data, test_size, shuffle=True)

Shuffle and randomly split data

Parameters:
  • data (List) – The data.

  • test_size (float) – The test size.

  • shuffle (bool) – Whether to shuffle the data. Defaults to True.

Returns:

Tuple – The train data, and the test data.

Return type:

Tuple

medcat.utils.meta_cat.ml_utils.print_report(epoch, running_loss, all_logits, y, name='Train')

Prints some basic stats during training

Parameters:
  • epoch (int) – Number of epochs.

  • running_loss (List) – The loss

  • all_logits (List) – List of logits

  • y (Any) – The y array.

  • name (str) – The name of the report. Defaults to Train.

Return type:

None

class medcat.utils.meta_cat.ml_utils.FocalLoss(alpha=None, gamma=2)

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

__init__(alpha=None, gamma=2)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(inputs, targets)
medcat.utils.meta_cat.ml_utils.train_model(model, data, config, save_dir_path=None)

Trains a LSTM model and BERT with autocheckpoints

Parameters:
  • model (nn.Module) – The model

  • data (List) – The data.

  • config (ConfigMetaCAT) – MetaCAT config.

  • save_dir_path (Optional[str]) – The save dir path if required. Defaults to None.

Returns:

Dict – The classification report for the winner.

Raises:

Exception – If auto-save is enabled but no save dir path is provided.

Return type:

Dict

medcat.utils.meta_cat.ml_utils.eval_model(model, data, config, tokenizer)

Evaluate a trained model on the provided data

Parameters:
Returns:

Dict – Results (precision, recall, f1, examples, confusion matrix)

Return type:

Dict