medcat.utils.meta_cat.ml_utils
Module Contents
Classes
Base class for all neural network modules. |
Functions
|
|
|
Creates a batch given data and start/end that denote batch size, will also add |
|
Predict on data used in the meta_cat.pipe |
|
Shuffle and randomly split data |
|
Prints some basic stats during training |
|
Trains a LSTM model and BERT with autocheckpoints |
|
Evaluate a trained model on the provided data |
Attributes
- medcat.utils.meta_cat.ml_utils.logger
- medcat.utils.meta_cat.ml_utils.set_all_seeds(seed)
- Parameters:
seed (int) –
- Return type:
None
- medcat.utils.meta_cat.ml_utils.create_batch_piped_data(data, start_ind, end_ind, device, pad_id)
Creates a batch given data and start/end that denote batch size, will also add padding and move to the right device.
- Parameters:
data (List[Tuple[List[int], int, Optional[int]]]) – Data in the format: [[<[input_ids]>, <cpos>, Optional[int]], …], the third column is optional and represents the output label
start_ind (int) – Start index of this batch
end_ind (int) – End index of this batch
device (torch.device) – Where to move the data
pad_id (int) – Padding index
- Returns:
x () – Same as data, but subsetted and as a tensor
cpos () – Center positions for the data
attention_mask – Indicating padding mask for the data
y – class label of the data
- Return type:
Tuple
- medcat.utils.meta_cat.ml_utils.predict(model, data, config)
Predict on data used in the meta_cat.pipe
- Parameters:
model (nn.Module) – The model.
data (List[Tuple[List[int], int, Optional[int]]]) – Data in the format: [[<input_ids>, <cpos>], …]
config (ConfigMetaCAT) – Configuration for this meta_cat instance.
- Returns:
predictions (List[int]) – For each row of input data a prediction
confidence (List[float]) – For each prediction a confidence value
- Return type:
Tuple
- medcat.utils.meta_cat.ml_utils.split_list_train_test(data, test_size, shuffle=True)
Shuffle and randomly split data
- Parameters:
data (List) – The data.
test_size (float) – The test size.
shuffle (bool) – Whether to shuffle the data. Defaults to True.
- Returns:
Tuple – The train data, and the test data.
- Return type:
Tuple
- medcat.utils.meta_cat.ml_utils.print_report(epoch, running_loss, all_logits, y, name='Train')
Prints some basic stats during training
- Parameters:
epoch (int) – Number of epochs.
running_loss (List) – The loss
all_logits (List) – List of logits
y (Any) – The y array.
name (str) – The name of the report. Defaults to Train.
- Return type:
None
- class medcat.utils.meta_cat.ml_utils.FocalLoss(alpha=None, gamma=2)
Bases:
torch.nn.Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to()
, etc.Note
As per the example above, an
__init__()
call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- __init__(alpha=None, gamma=2)
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(inputs, targets)
- medcat.utils.meta_cat.ml_utils.train_model(model, data, config, save_dir_path=None)
Trains a LSTM model and BERT with autocheckpoints
- Parameters:
model (nn.Module) – The model
data (List) – The data.
config (ConfigMetaCAT) – MetaCAT config.
save_dir_path (Optional[str]) – The save dir path if required. Defaults to None.
- Returns:
Dict – The classification report for the winner.
- Raises:
Exception – If auto-save is enabled but no save dir path is provided.
- Return type:
Dict
- medcat.utils.meta_cat.ml_utils.eval_model(model, data, config, tokenizer)
Evaluate a trained model on the provided data
- Parameters:
model (nn.Module) – The model.
data (List) – The data.
config (ConfigMetaCAT) – The MetaCAT config.
tokenizer (TokenizerWrapperBase) – The tokenizer.
- Returns:
Dict – Results (precision, recall, f1, examples, confusion matrix)
- Return type:
Dict